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ON THE RATIO OF THE VARIANCES OF TWO NORMAL POPULATIONS 


By Henry SCHEFFE 


Princeton University 
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1. Introduction and summary. Suppose that we have two samples F/; and 
E, from normal populations 7, and 72 with unknown means and variances. 
Let us designate by 6 the ratio of the variance of 7 to that of m.. The two 
problems discussed in this paper are to formulate in terms of FE, and Ez, and to 
compare, 

(2) significance tests for the hypothesis that the unknown ratio 6 is equal to a given 
positive number 6 , and 

(22) confidence intervals for 6. 

Since, on the one hand, these problems are of considerable importance to the 
practical statistician and the teacher of statistics, and on the other, they cry 
for the application of recently developed theory which is unfortunately not vet 
familiar to many practical workers and teachers, the development has been 
divided into two parts: Part I, it is hoped, will be intelligible to the above class 
of readers; part II, slanted toward a smaller circle, is more esoteric, general, and 
condensed. 

More specifically, in part I it is pointed out that any choice of limits on the 
F-distribution satisfying the condition that the sum of the areas in the tails 
be equal to a prescribed number, leads to solutions of problems (7) and (72). 
After considering and then ruling out the ‘‘one-sided”’ situations in which it is 
appropriate to use only one tail, two conditions are proposed (ad hoc and on an 
intuitive basis) for the “‘two-sided”’ case,—a symmetry condition, and a condi- 
tion for logarithmically shortest confidence intervals. The second condition 
leads to a choice of limits on the F-distribution. From other considerations,— 
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HENRY SCHEFFE 


reciprocal limits, likelihood ratio, and equal tails,—other choices are advanced. 
It is found that all four of these choices satisfy the first condition, and that 
furthermore if N; = Ne, where N; is the number of variates in £; , then the 
four choices become identical. If N; + Ne which of the four tests is “‘best’’? 
which of the four sets of confidence intervals? For defining and answering the 
first question in a logically satisfactory way just a little of the Neyman-Pearson 
theory of testing hypotheses suffices. For the second, Neyman’s theory of 
confidence intervals is called for, and because of its greater difficulty, this has 
been relegated to part II. However, the limits determined by the criterion 
that the test be unbiased turn out to be the same as those which yield optimum 
confidence intervals from the elementary viewpoint of §5. Their numerical 
values are unfortunately laborious to calculate accurately if Ni # Ne, and part 
I concludes with some numerical evidence indicating the loss of efficiency in 
using instead the easily found “equal tails” limits. For N; and N2 = 10 this 
loss is seen to be quite small. It will perhaps bear repeating that if N; = Ne, 
the ‘‘equal tails’ limits on the F-distribution are the same as those associated 
with the unbiased test and that hence in this case al the advantages uncovered 
in parts I and II for the unbiased test and the related confidence intervals are 
obtained by using the easily available ‘equal tails’ limits. 

In part IT we drop the restriction that the tests be based on a one or two-tailed 
use of the F-distribution. By a slight extension of results of Neyman and 
Pearson, common best critical regions for testing the hypothesis 6 = 6 against 
alternatives 6 < 0), or 6 > 6), arefound. Since the regions are always distinct 
for these two ‘‘one-sided”’ cases, there is no uniformly most powerful test. In 
order to find the most efficient unbiased test some recently published theorems 
of the writer are applied to prove that the critical region of the unbiased test 
proposed in part I is of type B;, . 

The fact that the results summarized in the above paragraph are obtained 
for arbitrary positive 6) will immediately suggest to the reader familiar with 
Neyman’s theory of confidence intervals that it may be easy on the basis of 
those results to draw conclusions about the existence of Neyman’s various cate- 
gories of confidence intervals. It is. In particular we find that the set of 
confidence intervals arrived at in §5 constitutes Neyman’s short unbiased set. 

The writer is aware that not all the results of this paper are new, and hopes 
he has given credit where it is due, but believes it desirable to bring together all 
the results, old and new, in this attempt to clean up the problems (7) and (77). 
He is pleased to acknowledge his debt to Mr. David Votaw for aiding in the 
calculations for fig. 1 and for finding the formulas (6). 


Part I. SIGNIFICANCE TESTS AND CONFIDENCE INTERVALS BASED ON THE 
F-DISTRIBUTION 

2. The F-distribution. The sample E;: (ti, %2,°-:, %ini), 7 = 1, 2, is 

assumed to be from a normal population 7; with mean a; and variance o;. We 
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write 6 = o;/o2, and might regard the statistic 7’ as an estimate! of 6, where 
T = s}/s2 and 


Ni Nx 
s = a (xi; —_ z)’/ni, Ze a : # xii /N; ; n= N; — 1. 
i= j=l 


It will be convenient to consider 6, 03 , @1 , d2 as the population parameters, oi 
being eliminated from the joint p.d.f. (probability density function) of EF, 
and E, by the substitution oj = 002. For any given positive number 6) we 
define the composite hypothesis 


ie: 06= 6, 0<0<+-, —-x»<a<+o, —-x <a@<+on, 


In Hotelling’s apt terminology the last three parameters are nuisance parameters. 

It is well known that U; and U2, where U; = njs;/o; , are independently 
distributed according to x’-laws with n; and n2 degrees of freedom respectively, 
and that hence the quotient F = (U;/m) + (U2/n2) = T/6 has the F-distribu- 
tion hp,n,(F) dF with m and nz degrees of freedom, where 


1 —} no) 
(ny No bry eo ny 2(ny+no 
han (u) = /na) a" ‘(1+—u °" O84 @. 
e Ne 


1 1 
B(gm , 2N2 


For later reference we note that if we define the variable x from 


(1) F = = re ? 
= i-2z 
then the cumulative distribution function of x is the incomplete Beta function” 
T,(3n1 , 32). 
Let a be any number such that 0 < a < 1 (@ will be the significance level 
for (7); 1 — a, the confidence coefficient for (7z)). The symbols A,,,,, Brin. 
will always denote a pair of numbers for which* 


(2) i sz hnyn,(u) du = 1—a. 


Anjng 


Every choice of the pair A, B leads to a solution of problems (7) and (77): 

(7). A test of Ho at significance level a consists of rejecting Hy if T < Any». or 
T > Bia. 

The probability of rejecting Ho if it is true is 


1 — Pr(A@ < T < BO | %) =1—Pr(A < T/O < B| %) = a, 


independently of the true values of the nuisance parameters. 


1 Biased. 

2 All the results of this paper pertaining to the F-distribution could of course be stated 
in terms of Fisher’s z-distribution [2] or the incomplete Beta distribution; the first is used 
here because of its popularity in applied statistics, and because it permits the simplest 
statements for solutions of problems (7) and (77). 

3 Superscripts on A, B will signify that a further condition has been laid on the pair 
A,B. The subscripts will be dropped when there is no danger of confusion. We permit 
B = ~ asa possible choice. 
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(27). A set of confidence intervals for 6 with confidence coefficient 1 — a is 
T'] Bsns > 6 S T/A NyNy ° 


The probability that the true value of 6 will be covered by the above random 
interval is 


Pr(T/B $ 6 < T/A|6) = Pr(A S$ T/6 S$ B|0) =1- <a, 


whatever be the true values of @ and the nuisance parameters. 

It will be convenient to adopt a brief notation for the tests and confidence 
intervals determined by certain choices of the limits A, B. In the sequel we 
shall denote these choices by Ajjn,, Bryn., Where 7 = I, II,---, VI. We 
shall call the significance test based on the pair A’, B' the test i, and the set of 
confidence intervals based on this pair, the set 7 of confidence intervals, or some- 
times more briefly, the confidence intervals 7. 





















3. Use of one tail. Suppose a situation in which we do not mind accepting 
H, if the true value of 6 exceeds 4 , but we desire a test which is as sensitive as 
possible in rejecting Hyp when 6 < 6). It can be shown (for n. > 2) that the 
expected value of T is G(T) = n26/(n2 — 2), and hence when the true value of 6 
is small compared with 6 , so is G(7). By the usual intuitive considerations we 
are led to rejecting Ho if F = 7/6 falls in the left tail of the F-distribution. To 
make the significance level equal to a we take the limits A, B so that 

I 


RBs I 
| hiyn,(u) du = a, Bain = ©. 
0 
Similarly, to test Ho against alternatives 6 > 6) we define test II by 


3 
fia sno = lnyno(\ UU) GU = Qa, 
ics. = 0,  Aimng(u) d 


B 


n,n 


eS 
Why test I is best for testing Ho against alternatives 6 < 6), and test II for 
6 > 4, will be explained more convincingly in §9. 

The confidence intervals I and II are then semi-infinite. It is apparent that 
if we are not loath to accept large values of @ but wish to exclude the largest 
possible interval of small values (0, 7’7/B), we should use the set II. Indeed, 
the set II is optimum in the case where we are willing to accept values of 6 larger 
than the true value but desire the. highest possible probability of excluding any 
values less than the true value; however, the precise formulation and proof 
of this statement must be postponed to part II. Analogous remarks apply to 
the set I and a willingness to accept values of @ less than the true value. 

For a = .05 or .01 the values of [ n, are given in Snedecor’s F-tables [12; 


‘If B = < we omit the equality sign to the left of 6, if A = 0, the equality sign to the 
right of @. 







‘we 


ms AY 


“SD Se I 


ie 
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I 
same my; , N2 as ours], and the values of A,,,,, may be calculated from the same 
tables by using the relation 


(3) Bins Uf 







I . ~ ~ = = . 
Aj,n. for a = .50, .25, .10, .025, .005 may be obtained by use of the transforma- 
tion (1) and Thompson’s new tables [13] of percentage points for the incomplete 
Beta distribution. B,, for these values of a can then be found from (3). 


4. Symmetry condition. We now restrict our attention (until §9) to the 


“two-sided” situation in which we are interested in all alternatives to 6 = @% 
on the range 0 < @< «. Let us contemplate the following symmetry condition: 
(4) Aare — i; 


for all positive integers n;, m2. The desirability of this condition and that of 
§5 follows not from mathematical principles but from practical considerations 
which might be relevant whenever significance tests or confidence intervals are 
considered for a parameter 6 which is the quotient of two other positive param- 
eters 6, and @: , and the estimate of 6 is the quotient of the estimates of 6; and @ . 

Suppose that given the samples £; and E,, computer C; labels them 1, 2, 
the same way we have, and using our test of §2, rejects the hypothesis that 
oi/o2 = k unless 


9 9 
A inp sigl < 81/82 3 Baynok; 


while computer C; labels them 2, 1, and following a similar rule rejects o3/o; = 1/k 
(in our notation) unless 


Ann, /k S 82/81 S&S Buagn,/k. 


It will be seen that (4) is merely the condition that they reach the same con- 
clusion. This makes life simpler, at least for computers and consulting statisti- 
cians. Likewise, if C; and C: use the confidence intervals of §2, then they will 
make numerically equivalent statements about o;/o2 and 3/0} if (4) is satisfied. 


5. Logarithmically shortest confidence intervals. The length of the confi- 
dence intervals of §2 is L = T(A~' — B™). We might consider choosing A, B 
in such a way that G(L) is minimum. This leads to the problem of minimizing 
A-' — B™ subject to (2). It might seem just as desirable, however, to minimize 
the expected length of the confidence interval for 6’, 


(T/B) < o/o. < (T/A). 


This leads to a different problem with a different solution. 

The condition on confidence intervals for 6 which appears intuitively desirable 
to the writer, is that the limits 6, 6 of the confidence interval 0(£;, F2) S$ 6 < 
6(E, , Ex) be such that G(log 6 — log @) is minimum. For the confidence inter- 
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vals of §2 this is equivalent to minimizing B/A, and by using the method of 
Lagrange’s multipliers we easily find that 


(5) [Uhnyng(U)luma = O 


and (2) must be satisfied. Denote the solution’ by A‘)n, , Bi, . It is evident 
that the same condition (5) is obtained if we ask for logarithmically shortest 
confidence intervals (based on the F-distribution) for 6° where k > 0. 

The numerical values of the limits A™, B™ are difficult to calculate if ny ¥ ne. 
The best procedure seems to be to transform to the incomplete Beta distribution 


by means of (1) and to calculate the corresponding points a‘), , bi; n, from the 
equations 
(6) [T.(3m , 1) = [T(3m +> I, Ine) Ia =i — a. 


The points a, 6 can be found to two decimals by inspection of Pearson’s tables 
(9]. Unfortunately, in the many cases where a is close to 0, or b to 1, A™, B™ 
are then subject to enormous error when calculated from (1). 


6. Reciprocal limits. While the problems (7) and (72) are closely related, the 
last choice of limits was suggested solely by our consideration of (77). Later 
we will reconsider this choice from the standpoint of (7),—the reader may 
anticipate that it will again be found advantageous in some respect. For the 
present, we proceed to three further choices, these arising from various ap- 
proaches to (2). 

The procedure recommended in several statistics manuals (see §8) for testing 
the hypothesis @ = 1 is to refer the quotient of the larger of sj , s: by the smaller 
to tables. This suggests the introduction of a statistic M defined as the maxi- 
mum of 7, 7’. Its distribution’ under the hypothesis 6 = 1 is easily found: 
Let gn,n.(V) be its p.d.f. Then for] Su S ~, 


Jn,n,(u) du = Pru < M <u+du|é@=1) 
= Pru<T <ut+duoru < TT <ut+du) 
= Pru<T <ut+du)+ Pru < T" <u+ du), 


since the last two terms are the probabilities of mutually exclusive events. 
Furthermore, the first term is h,,,,(u) du, and because of the symmetry induced 
by 0 = 1 we can evaluate the second term by merely interchanging subscripts. 
Hence the desired distribution is 


Jnyng(U) Dnyng(t) + Ingn,(U), 


regardless of the true values of the nuisance parameters. 





5 It can be shown by elementary methods that the solution of these equations exists and 
is unique; likewise for the solutions later denoted by superscripts IV and V. 
6 Considered by K. Pearson [8]. 
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If we reject the hypothesis 6 = 1 if M > M,,,, , where 


i Jn,n,(u) du = a, 
XM 


Tn ng 


then this significance test is easily shown to be the same as that of §2 with 
6) = l and 


—1 
Aun — Basu ° 


We remark that again these limits are not easy to compute if m + ne. 
ry . . . . Ty V . 
While this choice of A, B, which we shall call AX’ ,,, , Brin. , has been motivated 


only for the case 4 = 1, it leads of course to a test IV for any 6 and a set IV 
of confidence intervals. 


7. The likelihood ratio. Since the properties of \-criteria in general have 
received much attention in the literature, and since in particular the A-test for 
H, is equivalent to a certain choice of A, B, we shall mention it here, and see 
whether it has any advantages in §9. A for Ho in the case 0% = 1 was given by 
Pearson and Neyman [7; their Hi, n;, 8; , 8, An, are our Hy , N;, si(N;i — 1)/N;, 
Ni(N2 — 1)/{No(Ni — 1)T}, A]; for any 4 it may be shown to be 


—1 
A= Cong F* (1 = F) Rnyng(F). 
Ne 


On considering the (bell-shaped) graph of \ against F we see that \ < Xo cor- 
responds to two intervals, say 0 S F < F’ and F” < F Ss «. The d-test, 
which consists of rejecting Hyp when \ < Xo, where Ao is determined so that the 
significance level is a, is thus equivalent to test V with A¥,», , By ,n, satisfying 


(2) and 
3/2 n1 e r nd 
wt itu) Ann (u) = 0. 
Ne u=A 


8. Equal tails. Perhaps the most venerable procedure for determining limits 
on a distribution for a significance test in a ‘‘two-sided”’ case is to choose them 
so that the tails of the distribution have equal areas. Define AX!,, , Byin, from 


VI 


A oo 
nine 
| hiyn(u) du = / Rnyng (u) du = 3a. 
0 Brin 
1 2 
The values of BY!,, for a = .10 and .02 are given in the F-tables [12; same 
n, ,n2 as ours]as5% and 1% points. The relation 
- VI vi 
(7) ningBngn, = 1 


is easy to get, and hence A¥},, for these values of a may also be calculated from 
the F-tables. The limits for 3a = .25, .10, .025, .005 can be calculated from 
(1), (7), and Thompson’s tables [13]. 


* 
' 
' 
’ 
e 
‘ 
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Since test VI will later be seen to have some merit we will discuss it somewhat 
further at this point: In several statistics texts [e.g., 3, 14] the student is told 
to take the quotient of the larger by the smaller of sj , s3 , refer it to the F-table, 
taking the n, of the table to be the n; of the numerator, and to reject the null 
hypothesis @ = 1 if the sample value is larger than the tabulated. It is then 
further stated without proof that in using the 5% or 1% points of the F-table, 
the significance level is actually 10% or 2%. Since the quotient thus referred 
to the table is precisely the statistic M of §6, it would seem logical to refer it 
to an M-table rather than the F-table! However, the above procedure can be 
justified’ as follows: The equation (7) tells us that test VI fulfills the symmetry 
condition (4). It makes no difference then in his conclusions whether the 
computer uses the statistic sj/s: and the distribution h,,,.(F) or s>/s; and 
Rnon,(F). In particular he may always use the larger ratio and h»,(F), where 
m and n are the ‘“‘degrees of freedom”’ of numerator and denominator, respec- 
tively. Since this statistic cannot fall in the lower tail, he need consider only 
whether the calculated value exceeds the tabulated. But in using the value 
tabulated as the upper p% point of the F-distribution, he makes his test at the 2p% 
significance level. 





9. Comparison of the tests and confidence intervals. We now have at hand 
two one-tailed and four two-tailed tests, and corresponding sets of confidence 
intervals, all based on the F-distribution. We note at this point that all four 
of the two-tailed tests satisfy the symmetry condition (4), and that in the special 
case n; = no, these four tests become identical. In comparing any two tests, 
an instrument which makes their relative advantages completely anschaulich 
is the power curve (surface in a more complicated case). The definition and 
interpretation of the power curve of a test are based on the insight of Neyman 
and Pearson [5] that two types of error are possible in applying a test: We 
may (I) reject the hypothesis when it is true, or (II) accept it when it is false. 

We see immediately that for any test of the class considered in §2, the prob- 
ability of a type I error is the same, namely a. To find the probability of a 
type II error, let us introduce a little more terminology: We denote by F the 
sample point (£; , E2) and by w the region of sample space defined by 


(8) T < A® and T > Ba. 






w is called the critical region of the, test: the test rejects Hy if and only if F falls 
inw. The probability of this, which is called the power of the test, is 


1 — Pr(A/6 < T/6 < BO/0| 0, 2, a1, a). 








Since in the present case this happens to be completely independent of the true 
values of the nuisance parameters, even for 6 # @, let us write it as P(w | @). 
Then 

7 The writer is indebted to Mr. T. W. Anderson, Jr. for pointing out to him that it is not 
necessary to use the V-distribution. 
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B6o/0 
(9) P(w |) =1— fo inyng(u) du. 

A6/8 
Finally, by the power curve of the test we mean simply the graph of the power 
P(w | @) as a function of 6. 

We may now state the probability of a type II error: it is 1 — P(w | 6), where 
necessarily 6 # 6). Hence the ordinate on the power curve for @ # 4 is the 
probability of avoiding a type II error, while for 6 = 4 it is the probability of 
making a type Lerror. By inspection of equation (9) we find that, barring the 
cases B = ~ or A = 0 (tests I and II), P(w!|@) > 1las@—Oor-x. Wecal- 
culate the derivative to be 
































(10) P'(w | 6) — [hn ng(U) /O)n48, Cr) 
Piwle) 
en tala asc 1 a — II = 
aie | =, 
eo} — ag -——J L | Power Curves | Tar | || 
| ” | | or | | ; | 
‘ SN | tests I, I, MLVI | Af 1 | | | 
oan pte eid 
(ha lll [JEL II 
ve ; | tty / 
4 -—— — — — — ~ > 
5 -- 
Pfs 
3 = ee 
2 
- | | | Reals [ie . | | | | 
°° | = | —_— 
2 2 3 A 5 67 Bt 2 3 4 5 678904 
Fic. 1 


which is obviously continuous for 0 < @< «. If we equate this to zero we find 
a unique solution for 6, and hence the power curve has a single minimum point. 
In the exceptional case B = «x we see from (9) that P(w | 6) decreases mono- 
tonically from 1 to 0 as 6 increases from 0 to ~; in the case A = 0, P(w| @) 
increases monotonically from 0 to 1. Some power curves’ are plotted in fig. 1. 

Always understanding by w a region of the set defined by (8), and recalling 
the above interpretation of the ordinate on the power curve, we are led to ask 
whether there is not a w, say wo , whose power curve nowhere drops below any 
other curve P = P(w| 6). (They all pass through (4, @).) The test based 
on such a region wp would be called uniformly most powerful (UMP) of the class 
considered, and obviously would be preferred under any circumstances. Alas, 


8 Power curves for test V may be found in a paper by Brown [1]. It did not seem worth- 
while to construct curves for test IV, since the limits are hard to compute, the test is biased, 
and has little historical interest. 


ee one 
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it does not exist. Perhaps some insight into the fact of the general non-existence 
of UMP tests can be gained by returning to fig. 1. While fig. 1 is for the case 
ny = 10, ne = 20, and a = .05, the following remarks are valid for any n; , nz, @: 
We note that for testing Hy against alternatives @ < 4 test I is far superior to the 
other three, indeed it is superior to any of the tests of the class defined by (8) 
in the sense that its power curve lies above that of any of the other tests.’ But 
for alternatives 6 > 6, test I is seen to be very poor (the worst possible, it can 
be shown). Similar remarks apply to test II and the complementary alterna- 
tives. This constitutes the more convincing explanation promised in §2 of the 
superiority of tests I and II in the “one-sided” cases. Since the power curve 
of test I lies above all other power curves for 6 < 4 , and that of test II above 
all for 6 > @ , it is now clear that there is no UMP test of the class considered. 

To cope with the commonly occurring situation where there is no UMP test, 
Neyman and Pearson [5] defined an unbiased test,—one whose power curve has 
an absolute minimum at 6). The desirability of an unbiased test in the ‘‘two- 
sided” case is evident when we note that if a test is biased, the probability that 
we accept the hypothesis @ = 4 is greater if 6 has certain values 6 + 6 than if 
6 = 6. To find which, if any, of our tests is unbiased, we equate expression 
(10) to zero for 6 = 6. Asa result we find” the condition (5) which determines 
test IIT. 

We see now that the limits 4™, B™ yield the preferred test in the ‘‘two-sided”’ 
case, as well as the logarithmically shortest confidence intervals. However, as 
pointed out in §5, the numerical values of these limits are difficult to calculate, 
and the question then arises, do we lose much by using instead the easily ob- 
tained “equal tails’ limits AY!, BY'? In the case n; = 10, m2 = 20, a = .05, 
fig. 1 shows that the power curves of tests III and VI differ very little. The 
extent of the bias of test VI for other values of ni; , n2, and a = .05, .01 is in- 
dicated in table I. (The missing diagonal entries are all 1,5 or 1,1). Let 
us call the entries 8, 100 &, where B = Omin/O, & = P(wY! | Onin). From (10) 
and (1) we get the following formula for computing 8: 


B -— (RB ed GQ"! +t") 1(Q aid 1), 


where 


Q = 8/2, @ = a/(1 — a), B= b/(1 — b), 


and a and 1 — b are the 100(3a)% points on the incomplete Beta distribution 
for vo = m1, 1) = Mm, and 1» = nm, ve = Me, respectively, in the notation of 
Thompson’s tables [13]. & may then be computed by transforming (9), 


(1+ 8/8)" 
a=1 -[ 1am, ins) | : 
z=(1+86/@)"* 


9 The reader may prove this from (9) or note that it is a special case of the 
results of §10. 

10 The equivalent condition on the incomplete Beta distribution was given by Pitman 
[10] for the case 0 = 1. 








TABLE I 


Minimum points of power curves of test VI 
The entries are O0min/@0 , 100 P(w™ | Omin), 
Roman type fora = .05, bold face fora = .01 
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and using Pearson’s tables [9], or, when x is very close to 0 or 1, 
terms of the series 


using a few 


5” 2 n—-2 6 
— a an ir Seen. <a 
1,(4m, $n) = 1 — Ls(4n, 4m) B(3m, $n) 2 2°(m + 2) 1! 
wi ind F _(—Mn— OOO Fy.) 
2(m +4) 2! 2?(m + 6) 3! F 


In computing 8, & it is perhaps simplest to take mn; > m2 and use the relationships 


| = 


Nahe! 


Bryne = BF Bases Qn no = Anon, ° 


When sample sizes n; + 1, m2 + 1 are such that table I indicates a large bias’ 
it might be worthwhile to get limits for an unbiased test from the ‘‘equal tails’, 
limits as follows: The limits A™, B™ for an unbiased test III may be obtained 
by taking 

All = AY1/g, Bu — BY! /8, 


but the test will then be at significance level &. The gain in using A™, B™ instead 
of AY!, BY! is more apparent when we consider confidence intervals: The sets 
associated with A™, B™, and AV!, BY! have the same logarithmic lengths, but 
the confidence coefficients are 1 — & and 1 — a, respectively. 

This seems to be about as far as it is worthwhile to carry the developments 
at the elementary level of part I. Some inadequacies may already have disturbed 
the reader: Why not consider in place of the interval (A, B) on the range of F 
any measurable region” R such that the integral of h,,»,(F) over R is 1 — a? 
Under the transformation 7 = @F the complement of R, just as the complement 
of (A, B), would lead to critical regions w for which P(w | 69) = a@ for all values 
of the nuisance parameters. Critical regions satisfying the last condition are 
said to be similar to the sample space with regard to the nuisance parameters. 
More generally, how would our preferred test I, II, III stand up if we admit 
for comparison, tests based on any similar regions whatever? Finally, how 
can one formulate in a general way conditions for optimum confidence intervals, 
and would a more general formulation still lead to the preference of the sets 
I, II, 111? Answers to these questions will be found in part IT. 





Part II. SIGNIFICANCE TESTS AND CONFIDENCE INTERVALS BASED ON ANY 
Srm1ILaR REGIONS 


10. Common best critical regions. For the case @ = 1, Neyman and Pearson 
[6] have shown that the critical region of test I is the common best critical 
(CBC) region for testing Hy against alternatives 6 < 6. This result is easily 
extended to any 6 by a simple device. We consider the following 1:1 trans- 
formations of variables and parameters: 





1 Our intuitions may balk at the notion of using sets R more general than intervals, but 
it would nevertheless be reassuring to find that our tests can meet this competition. 
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(11) tj = Obti;, tur = Ane, J=1,2,-++,Nisk =1,2,-+-,Ne, 
(12) 6 = 00’, co: = (02), a = Bhai, G2 = ag. 


Denote by E;, E:, E’ the points corresponding to £, , E2, E, respectively, 
under the transformation (11), by # any point in the space of the three nuisance 
parameters, and by #’ its correspondent under the transformation (12), by 
H, the transformed hypothesis, Ho: 6’ = 1; 8’, unspecified. If w is any Borel- 
measurable region of the space of #, and w’ the map of w under (11), then 
Pr(E ew! 0,3) = Pr(E’ ew’ | 6’, 8’), which we shall write as 


(13) P(w| 6, 3) = P(w’| 6’, 8’). 


We note that the coordinates of E; are normally distributed with mean a; 
and variance (¢;) where (0:)” = 6’(02), all Ni + Ne coordinates being 
statistically independent. Designating the critical region of test I by wo, 
and its map under (11) by wo , the result of Neyman and Pearson may then be 
stated as follows: wo is a CBC region for Ho and alternatives 6’ < 1. Now 
suppose wo were not a CBC region for Ho and alternatives @ < 4. Then there 
would exist a region w; , a value 6; < 6), and a point 3; such that P(w, | 6: , 31) > 
P(wo | 01, 31), while P(w; | 0, #3) = a for all 3. Let wi, , 3; correspond to 
wi, 6, 3 under (11) and (12). Then from (13) we would have that 
P(w; | 6: , 3:1) > P(wo | 61, 31), where 6; < 1, while P(w; | 1, 3’) = e@ for all #&. 
But this would contradict the fact that wo is a CBC region for Ho and alternatives 
<i... 

The proof that the critical region of test II is a CBC region for testing Ho 
against alternatives @ > 6 is of course completely analogous. This establishes 
the non-existence of a UMP test for Ho , and so we consider next the existence 
of a “‘best”’ unbiased test. 


11. Type B, region. This section is a direct application of a recent paper 
“On the theory of testing composite hypotheses with one constraint”’ to which we 
shall refer as [11]. Since it is not feasible to restate here the definitions, assump- 
tions, and theorems of [11], we shall refer to them by their numbers there. It is 
convenient to transform the parameters of the p.d.f. of E by putting 


(14) ;= 1/y, 6 = 1/o , on an 1/h. 
Then 
(15) p(E|y, h,a,, a2) = (22) yh) . 





exp {—4yh[Ni(% — a1)” + Si] + A[No(% — a2)” + S3]}, 
where 


yr 2 
N= N, + Ne, S; = nN,s;. 


We note that type B and type B, regions (definitions 1, 2 in [11]) are invariant 
under certain transformations of parameters: Suppose new parameters 6’, 3’ 
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are introduced by 1:1 transformations 6 = 6(6’), 3? = 3(8’). Let 6 correspond 
to 6), and consider the transformed hypothesis Hy : 6’ = 60 ; 8’, unspecified. 
Sufficient conditions that a region be of type B for testing Hp if it is of type B 
for testing Ho are that the function 6(6’) have first and second derivatives and 
that the first not vanish at 6). The last statement remains true if B is replaced 
by B,. Since the transformations (14) satisfy these sufficient conditions, we 
define 






Hy: y = W; ov’ = (h, a, G2), unspecified, 










and propose to show that there exists a type B, region for testing Hj , and that 
it is the critical region of test IIT. 

For later reference we now note that the four functions of variables and 
parameters defined in Table II are mutually independently distributed as 
indicated there. 


TABLE II 


Function Distribution 
” ee i cae ial wie a - ry 
U, = whS,; = S,/o; x?, with m, degrees of freedom 
U2 = hS» = S2/o5 = ve Me ina si es 
aioe "oe ; ; : 
us = (WhN))(%1 — ai) = Ni(%i — ay)/o. | normal, with zero mean and unit variance 
j 








(AN2)3(F —- @) = N,(%2 aa as) /o sa = si - uisg oe = 


Let us first verify the critical assumption 3° of [11]: Identifying our y, h, a; , a2 
with 6; , 62, 63, 0; of [11], we find from (15) that 


o: = 4{Ni/y — INH — a1)? + Si]}, 

de = 4{N/h — ¥[Ni(%: — a1)? + Si] — [No(% — ae)? + Sal}, 
$3 WhNi(% — a), 
og, = hNo(2 — az), 













(16) 





i) ° — ‘ em 
and then check 3° by differentiating equations (16). 

rm . . ( 

lo verify assumption 4’, let x, 22, x3, 24 of [11] be our an, te, %1, 2, 
respectively. We calculate 


O(d1, d2, 3, b4) 
O(a, » U2, U3, La) 


= Whi(2x4 — 22)(a4 — 23), 


which vanishes only on the same set of probability zero for all admissible values 
of the parameters. The validity of assumption 5’ follows from §5 of [11], and 
there is no difficulty in verifying 1° and 2’. 
To apply theorem 1 of [11] we must find functions k;(¢2 , $3, ¢1'3 Yo, 28’), 
2 = 1, 2, such that 
ke 


(17) 1 P(¢d1, d2, Q3 y ps | Yo, 3’) do, — (1 rae a) [ same, 


ky 








ad 
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for t = 0, 1, where the symbols $; henceforth are understood to stand for the func- 
tions (16) with y replaced by Yo. If the functions k; exist, then the region in 
sample space defined by 


(18) 





gd < ky and 1 > ko 





is independent of 8’ and of type B. 
From equations (16) and Table II we see that 


gi = (Ni — u)/po, ¢2 = 3(N oo u2)/h, 


(19) a 7 
o3 = (WohN 1)°U3 4 dp (hN 2) U4 > 


where 
. 9 Ty y 2 2 
uy = U,+ u3, ue = Ui + U2 t+us+ uM, 


and y is put equal to y in U; , uz. Furthermore, for fixed ue , us , us, the range 
of uw is 


IA 


2 2 
U3 = U1 — = Wa 


Transforming the integrals in (17) by substituting (19) and 


p(Ui, Us, us, us| Yo, 9’) 


(1, 2, os, $s) O(U1, U2, Us , Us) ? 
O(u1, U2, U3, Us) o( U; ‘ U2, U3, U4) 


Dior, d2, $3, d4| Yo, 0’) = 


where the p.d.f. in the numerator is, from Table II, 
CU" 7 UY" exp (—3u2), 
we get as the equivalent of (17) 
Ko 1 
/ (Ni — m)*(u, — ug)" "(ue — ug — ws)?" dum = (1 — a) | same 
Ky 0 
with 
Ki(ue » Us, Us Yo ? 0’) _ ki(oe ’ $3 ’ 4 ; Yo ’ 0’). 
Finally, we let 
(20) z= (uw — u3)/(uw — us — uj), 
and get 
Ko 1 1 
| (Ni — us — (ue — us — ud)a]'2'™ 101 — 2)" de = (1 - a) [ same, 
ss 0 


where k;(us, Uz, Us ; Yo, 8’) are the values of x obtained by setting wu: equal to 
the function K,; in (20). The last condition is equivalent to 


Ko 1 
(21) | gm tty — gi dz = (1 — a) | same, {= 0, 1. 
; 0 


1 

















ee ee ee 
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Since zx is a continuous monotonic function of ¢; , (18) becomes 





(22) xr<K and x>k. 


Solutions for the functions x; , k2 satisfying (21) exist in the form «x; = constant. 
Indeed, if we now note that the x defined by (20) is the same as that defined in 
(1), and let x; = a, x. = b, we see that the conditions (21) are identical with 
(6), and that our method of finding type B regions has led us to the critical 
region of test III. 

To show that the type B region obtained from Theorem 1 of [11] is also of 
type B,, we appeal to Theorem 2: From (15) we have 


D(E |v, 9’)/p(E | bo , 8) = (W/Wo)*™ exp {(W — wo) (¢1 — 4Ni/Wo)}.- 


Since for Y * y this function is convex in ¢, , Theorem 2 is applicable. The 
result of this section is the conclusion that the critical region of test III is of 
type B, for testing Hy. 





12. Neyman’s categories of confidence intervals. The concepts and ter- 
minology of this section are those formulated in a basic paper [4] by Neyman. 
Suppose a distribution depends on a parameter 6, and on further parameters 
62, 03, -++* , 0; which we shall symbolize by 8. The hypothesis 


H(@): 6 = 0; 3, unspecified, 


may be called a composite hypothesis with one constraint [11]. Let E be the 
sample point, W be the sample space, and w be any Borel-measurable region in W. 





Write Pr{E ew | 0,3} = P{w| 6,8}. The condition that a critical region w() 
for testing H(6) be similar to W with respect to # is 
(23) P{w(@) | 0, 3%} = @ for all 3, 


where a@ is fixed throughout our discussion. Suppose for every admissible 6, 
there exists a similar region w(@). The complementary region A(@)) = 
W — w(@) we may call a region of acceptance. For any E we next define the 
linear set 6(£) of points on the 6-axis as the totality of points @ such that EF e A(@). 
The probability [4] that the random set 6(Z) cover a value 6” if the true value 
of 6 is @’ is 















(24) Pr{6” «5(E)| 6’, d}} = 1 — Pi{w(6”)| 6’, 8}, 
and hence from (23), 
(25) Pr{@’ €5(E)| 0,38} = 1—a 


for all 6’, 3, and we might call the aggregate {6(£)} a set of confidence regions 
with confidence coefficient 1 — a. Now if all 6(£) are intervals, then they form 
a set of confidence intervals. 

We have now shown that if H() is a composite hypothesis with one con- 
straint, if for every admissible 6) there exists a similar region w(@)) for testing 
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H(@), and if the aggregate {6(/)} determined by the family {w(@)} consists of 
intervals 6(F), then {6(F£)} is a set of confidence intervals. By similar use of 
(24) the reader may prove that if furthermore each w(6) of the family has the 
property P of the table below, then the corresponding set {5(Z)} of confidence 
intervals is of Neyman’s category C: 


P: property of w(60) | C: category of {5(E)} 
ibaa | 

gives UMP test | shortest 

CBC for 6 > 6 (or 6 < @) | best one-sided 

gives unbiased test | unbiased 

of type B | short unbiased 

of type By | shortest unbiased 


We have taken the liberty of calling a set of one-sided confidence intervals 
5(E): O(E) S$ 6 (oré@ S G(E)), 


where 6(E) and 6(Z) are Neyman’s unique lower and upper estimates, respec- 
tively, best one-sided, and of calling a set {59(#)} shortest unbiased if for all 6’, 3 
it satisfies (25) and 


(26) [aPr{0’ € 5(E) | 8, 8}/A6l = 0, 
while for any other set {6,(£)} satisfying (25) and (26), and all 6”, 0’, 3, 
Pr{0” €6(E) | 6’, &} S Pr{e” €6(E) | 6’, 3}. 


It follows immediately from this discussion that our sets II and I of con- 
fidence intervals are the best one-sided, and that the set III is not only a short, 
but the shortest, unbiased set. 

In conclusion, we remark that Neyman’s concept of the “shortness” of a set 
of confidence intervals strikes one at first as indirect,—to fully appreciate its 
elegance it is perhaps necessary to attempt the formulation of a general theory 
from a more naive approach,—and that it is then of interest to discover that 
in the present case his short unbiased set coincides with that reached by the 
direct intuitive (but obviously extremely limited) method of §5. 
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SETTING OF TOLERANCE LIMITS WHEN THE SAMPLE IS LARGE 
By ABRAHAM WALD 
Columbia University 


1. Introduction. Let f(m,---,2%p, 01, °°: , 6) be the joint probability 


density function of the variates 7, --+ , 2» involving k unknown parameters 
6:,---,6,. A sample of size n is drawn from this population. Denote by 
Viet = 1,---, pja=1,---,n) thea-th observation onz;. We will deal here 


with the following two problems of setting tolerance limits, which are of im- 
portance in the mass production of a product: 

Problem 1. For any two positive numbers B < 1 and y < 1 we have to con- 
struct p pairs of functions of the observations Li(an,+-+,2pn) and 
Ui(tn, +++ 5 Lpn) @ = 1, +++, p) such that 


Up U, 
(1) {| = ae flay ++ tps By ++ yO) diy ++ diy > | Oy “°° Ooh = 8 
Ly Ly 


where for any relation R, P(R | 6,, --+ , 6.) denotes the probability that R holds, 
calculated under the assumption that 6; , ---+ , 0, are the true values of the parameters. 

Problem 2. For any positive numbers B < 1, < 1 and for any positive integer 
N we have to construct p pairs of functions of the observations Li(an , -++ , Xpn) and 
Ua, +++ 5 2pn) with the following property: Let yii = 1,°---, pj a = 
1, --- , N) be the a-th observation on the variate x; in a second sample of size N 
drawn from the same population as the first sample has been drawn. Denote by M 
the number of different values of « for which the p inequalities 


Lieu Se Tun) < Yia < U (au Bers oe Syn) (z _ i, a ne P), 
are fulfilled. Then 
(2) P(M > AN | 0, --+, &) =B, 


where 0,, --+ , 0, denote the unknown parameter values of the population from 
which the observations Xia aNd Yia have been drawn. 

The functions L; and U; are called the tolerance limits for the variate 2; . 
We will say that LZ; is the lower, and U; the upper tolerance limit of z;. In 
general, there exist infinitely many tolerance limits L; and U; which are solu- 
tions of Problem 1 or Problem 2. It is clear that the tolerance limits L; and 
U; are the more favorable the smaller the difference U; — L;. Hence if there 
exist several solutions for the tolerance limits L; and U; we should select that 
one for which the difference U; — L; becomes a minimum in some sense. 

S. S. Wilks! gave e solution of Problems 1 and 2 in the univariate case, i.e. 





1§. S. Wilks, ‘‘Determination of sample sizes for setting tolerance limits,’”’? Annals of 
Math. Stat., Vol. 12 (1941). See also his paper on the same subject presented at. the meeting 
of the Institute of Mathematical Statistics in Poughkeepsie, September, 1942. 
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if p = 1. It seems that Wilks’ solution is the best possible one if nothing is 
known about the probability density function except that it is continuous. 
However, if it is known a priori that the unknown density function is an ele- 
ment of a k-parameter family of functions, it will in general be possible to derive 
tolerance limits which are considerably better than those proposed by Wilks. 

Wilks’ results can easily be extended to the multivariate case provided the 
variates 21, --- , 2, are known to be independently distributed.” This is a 
serious restriction, since in many practical cases the independence of the variates 
X1, °**, 2, cannot be assumed. The case of dependent variates has not been 
treated by Wilks. 

In this paper we give a solution of problems 1 and 2 when the size n of the 
sample is large. In the next section a lemma is proved which will be used in 
the derivation of tolerance limits. In section 3 the univariate case is treated 
and in section 4 the results are extended to the multivariate case. 


2. A lemma. We will prove the following 


LemMaA: Let {Xin}, +++, {tn} (n = 1,2, --- , ad inf.) be r sequences of random 
variables and let a,,---,a, be r constants such that the joint distribution of 
Vn(tin — a1), °°+, VN(trn — a,) converges with n — ~« towards the r-variate 


normal distribution with zero means and finite non-singular covariance matrix 
lloij|| @, 7 = 1,---,7r). Furthermore, let g(u.,---,u,) be a function of r 
variables u,, --- , u, which admits continuous first derivatives in the neighborhood 
of the point uy. = a),-°-:,u, = a,. Assume that at least one of the first partial 
derivatives of g(u1, --- , u,) is not zero at the point uy, = a1, +--+ ,uU,=a,. Then 
the distribution of ~/n{g(ain, °*+ tin) — g(ai, -** , 4,)] converges with n — ~% 
towards the normal distribution with zero mean and variance o, = > Z. 6 i59i9; 
7 


where g; denotes the partial derivative of g(u., --- , ur) with respect to u; taken at 
Ms = Gi, ***, & = G,. 

Proof: Since the joint distribution of Wn(ain — a), °°: , V/n(tin — a;) 
approaches an r-variate normal distribution with zero means and finite non- 
singular covariance matrix, the probability that 
(3) a ee . (¢=1,---,7r) 

Vn i v/n 
holds, converges to 1 with n — 2». From (3) and the continuity of the first 
derivatives of g(u;, --- , u,) it follows easily that for any positive e the prob- 
ability that 


r 


Vn (Xin = a;) ;—e 
a & , 
< Vn lo(rin, ++ tm) — glti,-++,4,)] < Dy Vn (ain — aidgi + € 


2 This was mentioned by Wilks in his paper presented at the meeting of the Institute of 
Mathematical Statistics in Poughkeepsie, N. Y., September, 1942. 
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holds, converges to 1 with n — o. Since the limit distribution of 
i. mei . . . 

> Vn(xin — 4:)gi is normal with zero mean and variance equal to L2l0i;9:9; ; 
t 


our Lemma follows easily from the fact that the quantity ¢ in (4) can be chosen 
arbitrarily small. 















3. The univariate case. In this section we assume that p = 1. Hence the 


probability density function f(a, --- , %p, 61, °++ , 0) is replaced by the uni- 
variate density function f(x, 6:, --- , 6). In order to simplify the notations, 
the letter 6 without any subscript will be used to denote the set of parameter 
values 0:, °°: , O&. 


For any positive — < 1 let 9(6, €) and ¥(6, ~) be two functions of 6 such that 
VO, ©) 
(5) | f(x, 0) dx = &. 
¢ (6, &) 


If f(x, 0) is a continuous function of x, functions ¢(6, £) and (6, ) satisfying (5) 
exist. It is clear that for any function ¢(6, &) subject to the condition 


¢(8, €) 
[ f(z, 0)dx<1-—¢€ 
there exists a function ¥(6, £) such that (5) holds. We will choose ¢(6, &) and 
y(0, £) so that (5) is satisfied and 
(6) ¥(0, —) — (8, —) < (6, —) — (8, &) 


for any value of 6 and for any functions g(@, £) and (6, €) which satisfy (5). 

Let 6; (¢ = 1, ---,k) be the maximum likelihood estimate of 6; calculated 
from the observations 21, --- ,2%pn. We propose the use of the tolerance 
limits 


(7) L = 9(6,&) and U = y(@6, £) 


where the value of the constant ~ has to be properly determined. Problem 1 
is solved if we can determine é as a function of 6 and y such that 















if (6, 8) 
(8) Pa : fle, ) dx > y|0b = 8, 
¢(4, §) 
Problem 2 is solved if we determine é as a function of 8, \ and N such that 


(9) P(M > \N|6) = 8B 


where M denotes the number of observation in the second sample which lie 
between the tolerance limits ¢(6, £) and ¥(6, £). The use of tolerance limits 
of the form (7) seems to be well justified by the fact that the functions ¢(6, &) 
and (6, £) satisfy (5) and (6) and that 6; is an optimum estimate of 6; (¢ = 
i, «== &). 

Now we will derive the large sample distribution of 
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v6, &) 


(10) 10,6.) =] fx, 0dr. 


¢(6, =) 


We obviously have 
(11) I(0, 6, ) = &. 


We will assume that the limit joint distribution of +/n(6, — 6), 
—/n(6; — 0.) is normal with mean values 0 and non-singular covariance matrix 
& log f(x, 6) 
a0; 30; 

(i,j = 1,---,k). This is known to be true if f(x, 6) satisfies some regularity 
conditions.’ Furthermore we assume that ¢(0, £) and ¥(@, £) admit continuous 
first partial derivatives with respect to 6; , --- , 6,and that f(x, @) is a continuous 
function of x in the neighborhood of x = ¢(6, £) anda = ¥(@,£). We have 


a1(6, 8, &) | - * = dg(8, &) 
— SIV(4, =), a] — 38, 


; : er ol(0, 6, & 
Assuming that at least one of the derivatives - . ‘) ; 
; 6=6 


|| o:;(8) || = || e:;(@) ||* where c;;(@) denotes the expected value of — 


(12) fie(6, =), 6] 


is not zero, it fol- 


lows from our Lemma that - 
a/n{I(6, 6, €) — I(6, 0, £)] = Vn{I(6, 6, €) — €] is in the limit normally distrib- 


uted with zero mean and variance 


o (6, £) = {fly(d, €), 6]} 2 ~ ) a4(8, §) oi,(8) 


06; 


— 2f[v(6, £), Olfle(@, £), 6] d > g) = é) 0;;(8) 


2 00(6, &) Ag(O, E 
+ {Flo(6, £), 01}? Zo Yo OME B69, 
2 i 06; 06; 

For any positive 8 < 1 denote by Ag the value for which 

(14) | c“aeag. 
V 2x Yds 

Then the probability that 
o(8, 5) 

Vn 


(15) 1(4, 8, &) 2 E + As 
converges with n — « towards @. 
Let 


», 74; y) 
Ms Vn 


(16) £(8, y, 6) = 


3 See for instance J. L. Doob, ‘‘Probability and statisties,’’ Trans. Amer. Math. Soc., 
October, 1934. 








ol- 


ib- 


0). 


ie. 
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If o(@, €) is continuous in @ and £, it follows easily from (15) that the probability 
that 


(17) I[6, 0, &(8, y, 6) > 


holds, converges to 8 with n — «x. Hence we can summarize our results in the 
following 

THEOREM 1: Let ¢(6, &) and (6, &) be two functions satisfying (5) and (6). 
Furthermore, let the functions 1(6, 0, €), 0° (0, =) and £(8, y, 6) be defined by (10), 
(13) and (16) respectively. Denote by 6 , --- , 0 the true values of the parameters. 
It ts assumed that there exist two positive numbers ¢ and 6 such that the following 
three conditions are fulfilled: 


k 
(a) For any point 6 for which z (0; — 0:)° < « the limit joint distribution of 
i=1 


Vn(@; — 0), ---, Vn(6. — 6), calculated under the assumption that @ is the 

true parameter point, is normal with zero means and a finite non-singular covariance 

matrix || o;;(@) || where o;;(0) is a continuous function of 6 in the domain 
0\2 

dX (0; — Oi)" < «. 

i 


a1 (6, 8, £) 


(b) The partial derivatives —~— 3 |. (¢=1,-+-+,k) are continuous fune- 
i |0=6 


tions of 6 and é in the domain 


k 


D (0: — Or) Se and |t-y| <8. 


‘= 


_.  91(6, 6 , , 
(c) At least one of the partial derivatives sh 7 7) _ (¢ = 1,--- ,k) zs not 
i |0=6 





equal to zero. 
Then the probability that 


I{6, 6°, &(8, y, 6)] > v7, 


holds, converges to B withn > ~, 
From Theorem 1 we obtain the following 
LARGE SAMPLE SOLUTION OF PROBLEM 1. For large n we can approximate the 
lower and upper tolerance limits by 
gl6, E(B, y, 6)] and ¥[6, E(B, v, 6)] respectively, where E(B, y, 6) is given by (16). 
Now we will deal with Problem 2. We distinguish two cases 


(a) lim © = 0, 


It is easy to see that in this case the solution of Problem 2 is obtained from that 
of Problem 1 by substituting \ for y. Hence for large n the tolerance limits 
can be approximated by ¢[6, (8, d, 6)] and y[6, £(6, A, 6)] respectively. 
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For these tolerance limits condition 2 is fulfilled in the limit, ie. 
lim P(M > AN |A,--+, 6) = 8B 


——— » eis a | ‘ 
(b) The integers n and N approach infinity while — remains bounded. 
n 


Denote +n [I (6, 6, £) — é] by uand WN ey _ :) by v, where M (€) denotes 


the number of observations in the second sample which fall between the limits 
¢(0, £) and (6, €). For any fixed value of uw the conditional i. value of 


M ‘ 
(é) is given by & + and the conditional variance of - 
N Vn 


1 i (e+ = u .) Be aaa 
€ l1-é- . Hence the conditional expected value of »v is 
Vn Vn 


equal to u/s and the conditional variance of v is equal to (« + =) (1 - 
n n 


é— +). Since the limit distribution of w is normal with zero mean and 
n 


is given by 


standard deviation o(6, £) given in (13), we find that the limit bivariate distribu- 
tion of u and v is given by 


Nv. \ 
(18) en : (° 2 lu d 
ro(6,t)Ve1 —&)° PL 2008) wa-H Jo 





From (18) it follows that the limit distribution of v is normal with zero mean 
and variance 





; N - 
n= oe, Dag 70, &) * ng — 5) —- 
n&(1 — &) + No'(@, ) 


n 


(19) 








From (19) it follows easily that the probability that 


M(é) Asoo 
20 — 2 


converges to B with i > «. Let 









s) Ms, /nr(1 — d) + No*(6, d) 
21 *(8,,6) =r» — 2. 4 / maul nee * 
(21) £*(B Tw = 


From (20) it follows that the probability that 
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converges to B with n — «. The letter M denotes the number of observations 
in the second sample which lie between the limits g/@, &*(8, \, 6)] and 
vid, £*(B, d, 6)]. 

We can summarize our results in the following 

THEOREM 2. Let 9(0, &) and (0, &) be two functions satisfying (5) and (6). 
Two samples of size n and N respectively are drawn and the maximum likelihood 
estimate 6 is calculated from the first sample only. Assume that conditions (a), 
(b) and (ce) of Theorem 1 are satisfied. Let =(8, y, 6) and &*(8, , 6) be defined 
by (16) and (21) respectively. 


4 





If n and = both approach infinity, the probability that > d holds, converges 
n 


to 8, where M denotes the number of observations in the second sample which lie 
between the limits 96, E(B, d, 6)] and W[6, E(B, A, 4)]. 

If n and N approach infinity while x remains bounded, the probability that 
M 
N 
second sample which lie between the limits ¢[6, &*(B, \, 6)] and ¥[6, &*(B, d, 4)]. 

From Theorem 2 we obtain the following 


> i holds, converges to B, where M denotes the number of observations in the 


LARGE SAMPLE SOLUTION OF PROBLEM 2. If n and ~ both approach infinity 
the lower and upper tolerance limits can be approximated by ¢[6, =(8, , 6)] and 
oie ‘ ‘ ; a N , 
¥[@, E(B, A, 8)] respectively. If n and N both approach infinity while remains 

? 


bounded, the tolerance limits can be approximated by ¢[6, &*(8, , 6)] and 
v6, E*(8, dX, 6)] respectively. The expressions &(B, \, 6) and £*(B, X, 6) are given 
by (16) and (21) respectively. 


4. The multivariate case. For any positive — < 1 let 9;(0, &) and y,(@, &) 













(¢ = 1, --- , p) be p pairs of functions of 6 such that 
vp (0,8) ¥1 (8,8) 
(22) |  f f(a, +++, 2p, 0) day +++ dx, = &. 
p (HE) ¢1 (82) 
If f(ai, +++ , &p, 6) is a continuous function of 2, --- , x», functions ¢;(6, &) 
and y;(0, £) (¢ = 1, --- , p) satisfying (22) certainly exist. As in the univariate 


case, there will be infinitely many sets of p pairs of functions ¢;(@, &) and ,(@, &) 
which satisfy (22). Since we wish to have tolerance limits as narrow as possible, 
we will try to choose the functions ¢;(@, £) and W;(@, £) so that ¥:(@, &) — (0, =) 
should be as small as possible. Since it is impossible to minimize all p differences 
Wi(0, €) — o1(0, —), --- , Wp(O, E) — o>(8, €) simultaneously, we will have to be 
satisfied with some compromise solution. For example, we could minimize 
the product [J [y.(@, £) — ¢:(6, €)] or some other function of the p differences 


u 
¥i(0, £) — (0, —). Another reasonable procedure would be to minimize 


ee i ail tata i ir an 


ow sy 4 


ee 
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II [vi(0, €) — o:(8, &)] subject to (22) and the condition that for any 7 and j, 


vi (8, - < - gi(9, &). 
¥;(8, &) — ¢;(6, €) 

Here we will deal with the problem of deriving tolerance limits for the variates 
%1,°**, 2, after the functions ¢,(0, £) and y;(6@, &) have been chosen. Since 
the theory of the multivariate case is very similar to that of the univariate 
case, we will merely outline it briefly. 

As tolerance limits for x; we will use the functions ¢;(, £) and y,(6, £) where 
the value of — has to be properly determined. Problem 1 is solved if we ean 
determine é as a function of 6 and y so that 


vp 6, &) v1 (6,8) 
(23) {| . + | ; f(a, “++, 2p, 0) day ++ dy > 1|0} = 8. 


\J entity ¢1 (9,é) 


is equal to the ratio of the standard deviation of x; to that of 2; . 


Problem 2 is solved if we determine & as a function of 8, \ and N such that con- 
dition 2 is fulfilled. Let 


: vp(6,£) ¥1(6,&) 

(24) 1(6, 6, ) = | ; | _ Sti, +++, tp, 0) dx --+ dz, 
¢p (68) 1 (6£) 

and let 


Vp (6,8) ¥i+1(6,)  pei—1(6,8) 
16,080) = [7 fo 
€p(4.E) ¢i+1(9,€) ¢ i—1 (9,&) 


(25) 


; » tp, 0) dx, +++ dx dry, +--+ dap. 
¢1 (9,&) 

We have 
41(6, 6, €)| _ - ay.(8, t) 7 


6, 0, &, W.(0, €) 
2. 99, I,[ é, (0, &)] 


(26) 


Og 
— 2 8) 146, 0, &, e(, £1 


al (6, 8, &)| 


Assuming that the partial derivatives . , 
00; 0==6 


(¢ = 1, ---,k) are con- 


a1 (6, 8, . ; 
(0, é) _ is not zero for at least one value of 
' 6=0 
i, it follows from our Lemma that ~/n [I (6, 6, £) — I(8, 6, £)] = Vn {[I(6, 6, £) — =] 
is in the limit normally distributed with mean value zero and variance 


0,8) = DEY YL WAH Wl. 14, 6, £, wl0, dO, 8, £,Y.(0, Dlou(0) 


=1 s=l1 j=1 i=1 6; 6; 


_9 rs pa : =: Os x £) ee £) 
“1,10, 0, &, Wel, E)Tql8, 0, &, ea(O, £)]oi;(4) 
4 x E = ys aoe £) vere §) 
| T[0, 0, €, os(0, E)\Zcl0, 0, &, gal, E)]oi;(4) 


tinuous functions and that 








ds, 
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where || o:;(@) || is the limit covariance matrix of Wn(é — 6), -°--, 
V/ n(x = 6x). 
For any positive 8 > 1, let As be the real value defined by the equation 
1 [ —4t2 ae 

(28) \/2n a é dt = B. 
Let 

; , +(6, y) 
(29) §(8, v, 6) = 7 — Ne 

8 we 

and 
(30) (8, d, 6) =» — 28g /ma = d) + NG", »)_ 


VNV 

We can easily prove the following two theorems: 

THEOREM 3. Lel ¢;(0, &) and y;(0, €) (@ = 1, --: , p) be p pairs of functions 
which satisfy (22). Let the functions (6, 0, &), 6°(0, &) and &(B, y, 6) be defined 
by (24), (27) and (29) respectively. Denote by 6), --- , 0 the true values of the 
parameters 6;, --- ,0,. It is assumed that there exist two positive numbers ¢ and 
5 such that the following three conditions are fulfilled: 

k 


(a) For any point 6 for which > (0; — 63)” < « the limit joint distribution of 
i=1 


—/n(b: — 0), ---, Vn( — 4%), calculated under the assumption that 6 is the 
true parameter point, is normal with zero means and a finite non-singular covariance 
matrix || o:;(@) || where o;;(@) is a continuous function of 6 in the domain 


(i — oi) Se. 
“ _.  al(é :; , 
(b) The partial derivatives — -_ te (¢ = 1,---,k) are continuous func- 
k 
tions of 0 and & in the domain ‘> (0; — 6°)" <eand|ét—y| <6. 
i=1 


A 0 aw 
(ec) At least one of the partial derivatives xe : 2») c 
06; |6=90 


not equal to zero. 
Then the probability that 


I{6, 6°, &(8, vy, 4] > 


holds, converges to B withn > ~., 

THEOREM 4. Let 9;(0, &) and ¥;(0, €) (@ = 1, --- , p) be p pairs of functions 
which satisfy (22). Two samples of size n and N respectively are drawn and the 
maximum likelihood estimate 6 is calculated from the first sample only. Assume 
that conditions (a), (b) and (c) of Theorem 3 are fulfilled and let &(8, y, 6) and 
c*(8, A, 6) be defined by (29) and (30) respectively. Denote by yia the outcome of 
the a-th observation on the i-th variate in the second sample. 
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If n and - both approach infinity, the probability that M > XN holds converges 
n 
to B, where M denotes the number of different values of a for which 


¢{6, (8, A, 6)| < Yia < vi[6, (8, A, 6)| (z — i. eo p). 





r 


If n and N approach infinity while — remains bounded, the probability that 
n 


M => XN holds converges to 8 where M denotes the number of different values of a 
for which 


¢il6, c*(B, A, 6) Ss Yia < v:[6, c*(B, A, 6)] (7 — i, ae Pp). 


The proofs of Theorems 3 and 4 are omitted since they are similar to the 
proofs of Theorems 1 and 2. 

From Theorem 3 we obtain the following 

LARGE SAMPLE SOLUTION OF PROBLEM 1. For large n we can approximate the 
lower and upper tolerance limits for x; by ¢:[6, &(8, y, 6)| and w,[6, E(B, y, 6)] 
respectively where §(B, y, 0) is given by (29). 

From Theorem 4 we obtain the following 


N ili 
LARGE SAMPLE SOLUTION OF PROBLEM 2. If n and — approach infinity, the 
n 
lower and upper tolerance limits for x; can be approximated by ¢,{6, €(8, d, 6)] and 
ae . ; ;, : a : 
W:[0, €(8, A, 8)| respectively. If n and N both approach infinity while — remains 
n 


bounded, the tolerance limits for x; can be approximated by ¢.|6, ¢*(B, \, 6)] and 
v[6, ¢*(B, d, 6)] respectively. The expressions ¢(B, d, 6) and ¢*(B, d, 6) are defined 
in (29) and (30) respectively. 














5. An example. Let x be a normally distributed variate with mean value 6; 
and standard deviation 6 , i.e. the probability density function of x is given by 
f(x 0 6 ) pom I et -0,) 2/02 
d 1 7) = — Ps 
" : V/ 25 Oo 
For any positive € < 1 let p(&) be the value for which 
1 p(é) 
—}t2 
== f e” dt = &. 
V 2 — p(t) 
Then the functions 


and 


v(6, £)= A+ p(E) Be 
satisfy conditions (5) and (6). 
We have 


° a a 7 (tq Pe &y 
A; = w+ + 2 =z and 6. = / a=! ‘ 
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n 
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The variance of +/n(6, — 6) is equal to 62 and the limit variance of /n(b2 — 62) 


is equal to 362. Singee the covariance of 6; and 62 is equal to zero, we obtain 
from (13) 


9 | ml ( 2 9 9 
o(0,&) = 2 jag ON \ {03 + 363[p(é)1} 


V 27 02 
1 Mol? ¢ a2 
- hae wor | tat — 463[0(E)}} 
lp (¢)Pe —[p(§)] 


Hence for large n the tolerance limits satisfying (1) can -be approximated by 
6, — p(E)b and 6, + p(é)6: respectively where 


p(v) ~} 
= ¢ sel(y)]? 


and \, is the value determined by the equation 


[ tae = 8B 
V/ 29 Ag E 


If n and N are large, the tolerance limits satisfying (2) can be approximated by 


6, — p(é*)@. and 6; + p(é*)@ respectively where 


t* =h— Ag fe ~~ + lo AP tear? e POI? | 


nr 















STATISTICAL PREDICTION WITH SPECIAL REFERENCE TO THE 
PROBLEM OF TOLERANCE LIMITS’ 


By S. 8S. WiLks 
Princeton University 


1. Introduction. Statistical methodology is becoming recognized in industry 
as an effective tool for dealing with certain problems of inspection and quality 
control in mass production. Quality control experts have found statistical 
methods useful in detecting excessive variation in a given quality characteristic 
of a product from a, series of observations on the given quality characteristic, 
and in isolating the causes of such variations back in the materials or operations 
involved in manufacturing the product. By a process of successive detection 
and elimination of causes of variability, a controlled state of quality is established. 
A practical statistical procedure for establishing a controlled state of quality 
has been developed by Shewhart.” More recently, manuals for routine applica- 
tion of this procedure have been issued by the American Standards <Asso- 
ciation.” 

In this paper we do not propose to go into a discussion of the application of 
the well known Shewhart procedure. The reader may refer to the literature 
mentioned in footnotes 2 and 3 for such discussion. It is sufficient to remark 
that experience shows that the application of this procedure leads to a con- 
trolled state of quality. Such a state of control provides a basis for making 
statistical predictions about measurements on the given quality characteristic 
in future production. 

More specifically, suppose a given quality characteristic of a given product is 
measured by a variable X, such that X has a specific value for each individual 
product-piece. For example, the product may be a given type of fuse and XY 
may be the blowing time in seconds.’ A product-piece would be a single fuse, 
and X would take on a value for each fuse. Thus, for a sequence of n fuses 
taken from the production line, there would be a corresponding sequence of 
values of X, say X,, X2,--- X,. if a state of control has been established 
with respect to blowing time as measured by X, then the sequence of values 
of X will “behave like a random sequence.”” By this we mean that the sequence 
will be such that we can safely assume that it can be described mathematically 
by regarding X as a continuous rahdom variable, i.e., such that there exists some 





























1 An expository paper presented at a joint session of the American Mathematical Society 
and the Institute of Mathematical Statistics at Poughkeepsie, September 9, 1942. 

2W. A. Shewhart, Control of Quality of Manufactured Product, D. Van Nostrand Com- 
pany, New York, 1931. 

3 Guide for Quality Control and Control Chart Method of Analyzing Data (1941), and 
Control Chart Method of Controlling Quality During Production (1942), American Standards 
Association, New York. 
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probability function f(x) which describes the distribution of values of X, such 
b 
that / f(x) dx is the probability that a < X < b for any two real numbers 


a and b. Now, suppose we consider a sequence or sample S, of n values of X, 
and let X; and X, be the smallest and largest values of X in the sequence. 
The types of questions with which we are concerned are the following: If a 
further sample, say S: of N values of X is taken, what is the probability P that 
at least No of the values will lie between X, and X» as determined by S,? If 
we choose a given probability a, at least what proportion of values of X in an 
indefinitely large sample S2 will fall between X, and X-2 of S; with probability a? 
What is the probability P’ that at least No of the values of S2 will exceed X, 
of S,? At least what proportion of values of X in an indefinitely large sample 
S. will exceed X, with probability a? These questions suggest several of a 
more general nature which can be treated by methods similar to those which a 
will be discussed. For example, instead of taking X,; and X, , i.e. the smallest if 
and largest items in S; as tolerance limits we could use Xm and X,~m41. More . 


generally, we may define 100R.% tolerance limits Ly(a1, 22, +--+ %n) and 
Le(x1, 2, °** 5 Xn) for probability level a of a sample S, of size n froma popula- 


tion with distribution f(x) dx as two functions of the X’s in S, such that the 
probability is a that at least 100R.% of the X’s of a further indefinitely large 
sample S» (i.e. the population) will lie between L; and Z,. Or more briefly 


P( ” (x) de > Re) es 


The same notion clearly applies if S2 is a finite sample of size N, rather than an 
indefinitely large one. In this case we would be interested in the largest integer 
N, such that the probability is at least a that at least 100R.% (2. = *e) 


of the X’s in S. would lie between L; and L.. In most practical situations we | 
are able to assume nothing more about f(x) than it is a probability density i 
function. We make only this assumption here. The only functions of the 
values of X in S; that we shall consider here in setting tolerance limits are order 
statistics, i.e. the ordered values of X, because the results will then be fairly 
simple and independent of f(x). 


tm ot 


2. A General Probability Formula. It will be convenient perhaps to derive 
a general probability formula at this stage from which we can derive certain 
special cases as we need them. 

Let X,, X2,--:,Xn be the n values of X in S, arranged in order of in- 
creasing magnitude. Let 7, 72, +--+ , 7% be integers such that 1 < m < mm < i 

~ <r n. Let az, ,%,,°**,2%, be k real numbers. Let 


l fry tr © 
| f(x) dx = ni, | * f(x) dz = pe, yf f(x) dx = prsa, 
I— od sr, Trp 
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from which 


f(z,,) dx,, = dpi, S(Xr,) dz, = dpe, ++: , f(t) dx, = dp, . 


Then assuming X,, X2,---,X, to be a random sample (ordered) from a 
population with probability element f(x) dx it follows from the multinomial 
distribution law’ that the probability of x,, < X,, < x,, + dz,, (i = 1,2, +++ ,k) 
is given by 

n! 
(1) vi — Ul fe — ry — 11 «os He — Pang — 11 — a 


r,—1 Pe) 


) Pi Pe 


'k-Tk-17—1 N—Ph 


Di Divi” Ap, dps +--+ dpx 
except for terms of order higher than (dpidp2---dp,). Given that X,, = 
Yn, °° >Xy = X, in S,, the conditional probability that Ni, No, --- 


> 


k+1 
Nera be N;= v) of the values of X in S2 will fall in the intervals (— ~, x,,), 
1 


(+, 5 ry), °** » (Wy, ©) Yespectively is by the multinomial law 


N! — 7 
(2 ae ee eee - g 2 ses Sea 
) Ni! Nol --- Neat 2 ? _ 


The joint probability law of X,,, X,,,-°--X, and Ni, Ne,---, Nea 


k+1 

( N;= v) is given by the product of (1) and (2). Integrating this product 
1 

with respect to the 2’s (i.e. the p’s) we find the probability law of the N’s to be 


(3) Nin! Ni a n= 1!No+ n= = l!--- Ni ood i= tn.a = 1! Newt - es Tul 
P m= l!re = i. ] ! coe te = 3 l!n = ret N + n!N,! No! cee Nias! 
which is clearly independent of f(x). This result can be derived by direct com- 
binatorial methods but the present derivation provides a simple proof that the 
result is independent of f(x). 


3. The Problem of One Tolerance Limit. There are problems in quality 
control in which it is important to consider only one tolerance limit. For 
example, in testing breaking strength of steel wire the most significant tolerance 
limit is the lower one. The problem of prediction in this case is as follows: 

4 Which states that if a trial results in one and only one of the mutually exclusive events 
E,,E.2,-:-,Ex, the probability P’that in a total of n trials n, will result in E, , nz in 

k 


E,,+-++,min EX ( > n= "), is given by 


! 
nN: np 


Pi Pi ae Bcc k 
I ~~ , ' ' ‘ _P P, P,. 
NM! No! +--+ mp! 1 2 


k 

where pi, P2,°**, Pk, (= Pi = :) are the probabilities of a single trial resulting in Ff, 
1 

E, , +--+ , Ex respectively. 
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Suppose the given quality characteristic, as measured by X, is in a state of 
statistical control, and that a sequence of n measurements on X have been 
made. Let X; be the smallest of the n values. What is the probability that at 
least No of N further measurements on X will exceed the value X, as deter- 
mined by the initial sample? Instead of considering the smallest value of X 
as the lower tolerance limit we could just as easily choose the second smallest, 
or any other small order statistic but the case of the smallest value is perhaps 
of greater practical interest than any other case. The problem of an upper 
tolerance limit is entirely similar to that of a lower tolerance limit. 


TABLE I 
Values of N, and R, for a = 0.99 and 0.95 for several combinations of values of N 


and n, and for the problem of one tolerance limit. (For N = ~, R, ts denoted 



































by Ra) 
a = 0.99 a = 0.95 

n N — — - — | cna 
N 99 | Ro N15 | R45 
10 10 5 |  .500 7 |  .700 
10 20 11 550 14 | .700 
10 20 re 631 — 741 
50 50 44 .880 46 | .920 
50 100 =| 90 | .900 93 | 930 
50 20 — | «912 — | .92 
100 100 94 | .940 9% | 960 
100 200 139 || 945 iss | 965 
100 2 — .955 — | 970 
500 500 | 494 | 988 496 |  .992 
500 1000 =| «= (989s |~S 989 993 | 993 
500 >» | — | — | .994 








The probability P:(No) that No of the N further measurements will exceed the 
smallest value of X in an initially drawn sample of size n is given by (3) for 
k= Be 1 = a. Ne = No, Ny = N- No, i.e. 


(4) P\(No) _ NIN +n! 

Values of Pi(No) can be easily calculated by using the recursion formula 
5 1 — 1) = ——_No_ 

(5) P\(No 1) No Sant P,(No). 
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For given values of N, n and @ we are interested in the largest integer N. for 
which 


(6) >> P(N.) > a. 


No=Na 


Na - — g a 
If we set Vv = FR, and set Lim R, = R, it can be verified that the value of 


4 No 


R, is given by solving the following equation for R, 


1 
n| e"' dé = a. 
Ra 


It will be observed that né"' dé is to within terms of order dé the probability 
that —& < f(x) dx < & + dé in samples of size n from a distribution with 
ix. 


probability element f(x) dx, where X, is the smallest value of X in the sample. 
The statistical interpretation of (7) is simply this: The probability is a that the 
proportion of values of X exceeding X, in a further indefinitely large sample is 
at least Ra. 

Choosing a = 0.99 and 0.95 Table I shows values of N, and R, for various 
combinations of values of n and N for the case of one tolerance limit. The 
table indicates the degree of precision with which predictions about a single 
tolerance limit can be made from a sample of size n about a further sample of 
size N for a few important values of n and N. It should be noted that each 
prediction is made concerning a pair of samples, i.e. an initial sample of size n 
and a further sample of size N and that the prediction holds for any function f(x). 
Thus as a typical entry we may state that if a sample of 100 is drawn and also 
a sample of 200, then the probability is 0.99 (approx.) that the X’s of at least 
189 (or 94.5%) of the cases in the second sample will exceed thesmallest X in 
the first sample. 


4. The Problem of Two Tolerance Limits. Again, suppose the given quality 
characteristic as measured by X is in a state of statistical control and that a 
sequence of nm measurements are made on X. Let X, and X, be the smallest 
and largest values of X respectively. The questioneto be considered now is the 
following: What is the probability that at least No of N further measurements 
on X will lie between the values X; and X,, , as determined by the initialsample? 

We proceed by considering the special case of (3) for which k = 2,7 = 1 
ro = n,Ne = No, Ns = N— No — MN,. We find for the joint distribution 
of N,; and No 


Nin! No +n — 2! 


(8) P(N1, No) = n—2!No!N +n!° 








yr 


of 
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-(10) P(No — 1) 
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To obtain the distribution of No, we simply sum (8) with respect to N; from 
0 to N — No, thus obtaining 


N!Not+ n — 2! 
AJ = 4 — —_ l ciuitusiansiniiaigi nis —— 
(9) P2(No) = n(n — 1)(N — No + 1) NIN ial 


A convenient recursion formula for computation purposes is 
— NAN -~No+2) 
(N — No + 1)(No + n — 2) 


For given values of N, n and a we require the largest value of N. for which 


P(No). 


N 


(11) > Px(No) >a. 


No=Na 


a a - ia inten s 
Setting WV = Rk, and Lim Rk, = R, one finds that R, is given by solving 


N-o 


the equation’ for R. 
1 

(12) n(n — » | e""(1 — &) dé =a. 
Ra 


It can be verified that n(n — 1)é" (1 — £) dé is to within terms of order dé 


Xn 


the probability that — < f(x) dx < & + dé, thus showing that (12) is the 
xX; 


probability that the proportion of an indefinitely large number of further values 
of X lying between X, and X, is at least R.. 

Table II gives, for the case of two tolerance limits, values of N, and R, for 
several important combinations of n and N, including limiting values R, of R. 
for indefinitely large N. 

It should be noted that the problem of two tolerance limits can be immediately 
extended to the case where the lower and upper tolerance limits may be any two 
of the order statistics in S,. 


5. The Problem of Tolerance Limits for Two Quality Characteristics. We 
have thus far devoted our discussion to the problem of tolerance limits for a 
single quality characteristic. The problem of two or more quality character- 
istics can be treated by methods similar to those already used. The simplest 
case is that in which each product-piece under consideration is measured on two 
independent quality characteristics. ‘Suppose the two characteristics are meas- 
ured by X and Y. Let a sample of n product-pieces be taken, assuming a state 
of statistical control has been established, and let X, be the smallest of the X 
values and Y, the smallest of the Y values. The question with which we are 

5 This limiting case in the problem of tolerance limits as well as that expressed in (7) 
and other similar limiting cases have been considered by the author in an earlier paper: 
“Determination of Sample Sizes for Setting Tolerance Limits,’”’ Annals of Math. Stat. 
Vol. XII (1941) pp. 91-96. 
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concerned here is the following: If N further product-pieces are measured on X 
and Y, what is the probability that X > X, and Y > Y;, for Noof the pieces? 
Let X and Y be statistically independent and let f(x) and g(y) be ~ probability 


functions of X and Y respectively. Let [- f(x) dx = p and ofa dy = 


The probability law of p and q is 
(13) n°'(1 — p)" “(1 — q)" dpdg. 


TABLE II 


Values of N, and R, for a = .99 and .95 for several combinations of values of N 
and n and for the problem of two tolerance limits. (For N = «, R, ts denoted 
by Ra) 


a = 0.99 a = 0.95 








R99 N..95 Ro; 


. 400 . 900 
. 400 .950 
.496 — . 606 
.840 
.850 
.874 

100 : .890 


100 200 . 920 
100 oo _ .935 








500 500 . 982 
500 1000 ¢ .985 YSt . 989 
500 x on .987 | — | .991 





In a further sample of size N the probability that for No of the cases, X > X, 
and Y > Y,, X; and Y, being determined by the first sample, is 
(14) NM! gpa ort —a—pa or. 

No! N — No! 


The joint probability law of No , p and q is given by the product of (13) and (14). 
Integrating this product with respect to p and q we obtain as the probability 
law of No, 


: ary . 2 (N\*S — (—1)' 
(15) H;(No) = n a) 2» ( ; (n+ No +1)2' 





N 


X, 


14). 
lity 
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For given values of N, n and a it is important, as before, to determine N, as 
the largest integer for which 


(16) . P3(No) > a. 


N o=Na 


Setting WV = R, and Lim R, = R, one finds R, to be given by solving the 


4 No 





following equation for R, 


1 
(17) —n' | t” logé dé = a. 


The expression —nt"" log édé is simply the probability that & < 


( f(x) az\( f q(y) iy) < & + dé to within terms of order dé, which is the 
XxX;  £ 


proportion of the population pairs (X, Y) for which X > X,; and Y > ¥,. 

In the problem of two tolerance limits for each quality characteristic, as deter- 
mined by an initial sample of size n, we calculate the probability that No mem- 
bers of a further sample of size N will fall within the two sets of tolerance limits, 
with respect to the two characteristics. The problem is similar to that for 
one tolerance limit for each of two quality characteristics. For this case, we 
find corresponding to (15), (16), (17), respectively, the following: 


- N—No g — lage 7 
(18) P(No) = n(n — 1) 4 , » (* (—1) 
Vo 


i=0 a (Notn—1+2)(Notn4+1)’ 
and 
(19) DX PAN») > a 
and 
9 9 ‘ 9 
(20) n(n — | £"“[2(é — 1) — (&€ + 1) log &] dé = a. 


The derivations of results analogous to (15), (16), (17), (18), (19), (20) for 
tolerance limits defined by other order statistics than least and greatest and 
also for more than two independent® quality characteristics are straightforward. 


6. Further Remarks and Discussion. For a given set of tolerance limits on a 
random variable X as determined by an initial sample of size n, we have dis- 
cussed the problem of predicting, with a given degree of probability, at least 
what proportion of values of x in a further sample (finite or indefinitely large) 
will lie between these tolerance limits. We have obtained theoretical results 


6 In a paper to appear in a forthcoming issue of the Annals of Math. Stat., A. Wald has 
shown how to set up tolerance limits for the case of two or more statistically dependent 
variables. 
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which depend only on the assumption that X is a continuous random variable 
with some probability element f(x) dz, where f(x) is not assumed known. 

It should be emphasized that the concept of a random variable is very broad 
in the sense that X may be a random variable determined as a result of ecaleula- 
tions on other random variables. For example, XY may be the difference, 
product, or ratio of two random variables, or the average or any other ‘“‘reason- 
able” function of several random variables which may be of interest in any given 
situation. Thus, on the basis of an initial sample of differences of two random 
variables, we may set up tolerance limits of differences and make predictions, 
for a given probability level as to how many differences in a further sample 
of differences will lie between these tolerance limits. Similarly for products, 
ratios, and other functions of random variables. 

From the point of view of practical application, we should again note that the 
mathematical assumption that X is a random variable means that a state of 
statistical control as described in §1 must exist in the measurements to which 
the tolerance limit prediction theory is to be applied. In practice X is often a 
discrete variable, i.e. one which can take on only certain isolated values. For 
example, if X is the number of defective product-pieces in a drawing of one 
product-piece, X is either 0 or 1, depending on whether the piece was non- 
defective or defective. Our theory would not be applicable to such a case. 
However, if we take as a new variable the average value of X for several product- 
pieces, we then obtain a variable that is continuous enough for the tolerance 
limit theory to be applicable for all practical purposes. 

Finally, we remark that although we have used, as concrete examples, situa- 
tions in mass production engineering, the notions of tolerance limits and predic- 
tions within tolerance limits which have been discussed apply equally well to 
situations in any branch of applied science where measurements are made and 
used as a basis for predictions concerning future measurements. 


7. Summary. After a state of statistical control has been established with 
respect to a quality characteristic of product-pieces in mass production by the 
standard statistical quality control methods developed and refined by Shewhart 
and others, there remains the problem of determining the accuracy of predic- 
tions as to how many future product-pieces will fall within tolerance limits 
specified by measurements on product-pieces already produced under the given 
state of control. This problem and some of its extensions are discussed in the 
present paper. 

More specifically, suppose an initial sample of x product-pieces, manufactured 
under a given state of statistical control, are measured with respect to a given 
quality characteristic. Let X be a variable which measures the given charac- 
teristic, so that XY has a definite value for each product-piece. Let X, be the 
smallest and X,, the largest value of X which occurs in the initial sample. Now 
consider a further sample of size N. The following problems of prediction re- 
lating to the second sample from information yielded by the initial sample are 
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considered: (1) What is the probability that at least No values of X in the second 
sample will exceed the tolerance limit X; set by the first sample? (2) What is the 
probability that at least No values of X in the second sample will lie between the 
two tolerance limits X, and X,, set by the first sample? (3) For given values 
of n and N and a (e.g., .99 or .95), what is the largest integer NV. such that the 
probability is at least a that No > Na? (4) What is the limiting value of 
a = R, as N increases indefinitely? Tables of values of N. and R, are given 
for each of the two problems (1) and (2), for several important combinations of 
values of n and N and for a = .99 and .95. 

Problems similar to (1), (2) and (3) are discussed for the case in which toler- 
ance limits are placed on two or more quality characteristics simultaneously. 

The generality of the theory of tolerance limits and how it applies to differ- 
ences, products and ratios and other functions of two or more random variables 
are briefly discussed. 











GENERALIZED POISSON DISTRIBUTION 


By F. E. SATTERTHWAITE 


Aetna Life Insurance Company 


1. Introduction. The Poisson distribution is one of the most fundamental 
of statistical distributions. It is the distribution law for the number of events 
if the probability of an event happening in any infinitesimal unit of time is inde- 
pendent of the probability of its happening in any other unit of time. Fre- 
quently when we analyze statistics which obey the Poisson law it is desirable to 
give varying weights to the different events instead of considering them all of 
equal value. Such is the case in analyzing insurance statistics where the events 
are the claims received by the office and the weights are the cost of the claim 
to the company. We shall now show how the Poisson distribution can be 
generalized so as to be adequate for such an analysis. 


2. First development. Let f(x, a) be the distribution function of the weights 
assigned to the events where the variable, x, refers to the weight and the vari- 
able, a, refers to time. The characteristic function of f(x, a) is 


g(t, a) = | e f(x, a) dx. 


Also let p(a@) da be the probability that an event will occur in the infinitesimal 
unit of time, a to a + da. If y represents the sum of the weights, the distri- 
bution function of y for this unit of time is 


Faaly, ©) = 1 — p(a) da, y=0 
=fly,a)p(a)da, y>QO. 
The characteristic function of this distribution is 


(1) 


@i.(t, «) = e'’(1 — pla) da) + pla) da | e'’ f(y, a) dy 
(2) oi ~~ gel ~ 6 od 


Js g eee) 
In forming equations (1) and (2) we ignore infinitesimals of orders higher than 
the first in the da. . 
The expected number of events in the period of time from a to az is 


P= | " p(a) da, 


and the mean distribution of weights during the same period of time is 


Ke / ” tp(a)/Plf(2, @) da. 


a | 


410 
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The characteristic function of this mean distribution of weights is 


eo) = [ fla) de 


= | [p(a)/P]¢(t, a) da. 


These equations are based on the assumption that the probability of an event 
occurring in any unit of time is independent of the probability of its occurrence 
in any other unit of time and also the assumption that the weights assigned to 
each event are independent. These assumptions are implied in all that follows. 

Since the characteristic function of the sum of independent variables is equal 
to the product of the respective characteristic functions, the characteristic func- 
tion of the sum of the weights during the period of time, a1 to ae, is 


®(t) = MPce(t, a) 
(3) _— eI p@datf (a) o(t,a)da 


—P(1—9¢(t 
= 7 P-)_ 


Applying the Fourier transformation, the distribution function of the sum of 
the weights is 


F(y) oi 1 ee dt. 
2r 


Equation (3) gives a convenient method for defining a generalized Poisson 
distribution. Any distribution which has a characteristic function in the form 
of &(t) where ¢(?) is the characteristic function of an arbitrary distribution will 
have all the properties of a generalized Poisson distribution. 


3. Second development. If we let ¢(¢) represent the characteristic function 
of an arbitrary distribution, the characteristic function of the sum of n inde- 
pendent items obeying such a distribution law is ®,(¢) = [@(é)]". If instead of 
considering n to be a fixed quantity we assume that it is an independent sta- 
tistical variable obeying the Poisson distribution law with mean P, the charac- 
teristic function of the sum, y, of the items of the sample becomes 


P(t) 


Lom aaa 
=n — P'le()l"e* 
nN: 
se e PU-o)) 


Therefore y is seen to obey the generalized Poisson distribution law. 


4. Properties. The generalized Poisson distribution preserves the unique and 
very important property of the Poisson distribution that nowhere in its develop- 
ment is it necessary to make any assumptions regarding homogeneity. The 
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only requirement is that the occurrence of and weight assigned to any event 
shall be independent of the occurrence of or weight assigned to any other event. 

The distribution of the sum of the weights is a function of the expected number 
of events, P, and of the mean distribution of weights, f(x), alone. It is inde- 
pendent of the way in which P and f(x) are made up. Thus, if we are studying 
the distribution of the sum of the weights over a period of a year and if P and 
f(x) vary with the seasons, the distribution of y is no different than it would be 
if P and f(x) were constant. It is only necessary that the f(x)’s for the different 
seasons be weighted in proportion to the expected number of events in deter- 
mining the mean f(x). 

Note also that in the first development it is not necessary that the variable, a, 
refer to time. It could just as well refer to different classes of events dis- 
tinguished on any other basis. Therefore, heterogeneous material may be com- 
bined in an analysis if it is possible to determine the appropriate mean distri- 
bution of weights. 

For a given weight distribution the generalized Poisson distribution for an ’ 
expected number of events, nP, is identical with the distribution of the sum of n 
independent items each of which obeys a generalized Poisson distribution with 
P expected events. 

Because of the property described in the preceding paragraph it is immediately 
apparent that a generalized Poisson distribution obeys the law of large numbers. 
As the number of expected events increases the distribution approaches the 
normal distribution. 

5. Moments. The moments of a generalized Poisson distribution are func- 
tions of the moments of the underlying weight distribution. By differentiating 
the characteristic function we obtain the following formulas in which the pre- 
subscript, o , refers to the moments of the weight distribution, f(z): 


Ma = m 





9 


wo = Powe = o 










bs 
ws = Pous + 3(Pous)’. 













The above formulas may be verified through general reasoning by considering 
the moments of the distribution, Faa(y, a) (see equation (1)). This distribu- 
tion refers to an infinitesimal unit’of time and all the moments about zero are 
infinitesimals of the first order. In passing from the moments about zero to 
the moments about the mean the corrections are all infinitesimals of at least the 
second order. Therefore, the corrections may be ignored and the moments 
about the mean may be considered to be equal to those about zero. The above 
formulas follow if we take a sample of size P/pda from this population. 

In order to obtain Pearson’s moment functions for a generalized Poisson 
distribution for any given mean value it is convenient to calculate the following 
parameters of the weight distribution: 
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, 
om = of 


2 
0c 


one/ om 
1 = (ous/om)’ /oo® 
o(Bs — 3) = (oua/om)/oo". 


The Pearson moment functions then take the convenient forms: 


(4) 


o /m = oa /m 
(5) Bi = o8i/m 
(6: — 3) o(B2 — 3)/m. 


6. Further generalizations. Often the expected number of events is not 
known but can be estimated to a greater or less degree of accuracy. In such a 
case it is convenient to assume that P is a statistical variable distributed about 
some expected value, say P’. A Type III distribution, 


g(P) = 3 i) pt e trip’ 
- T(b) \P’ , 


will generally be as satisfactory as any to assume for P. The parameter, b, 
can be chosen to give any desired standard deviation. The characteristic func- 
tion of the distribution of the sum of the weights under these conditions becomes 


&’(t) _ [ere gp) dP 


; ; a 
“ E 4 Pa - #0) | 
b 
The second development suggests another generalization. Instead of assum- 
ing that the number of events, n, is distributed in accord with the Poisson 
distribution, we may assume any discrete, non-negative distribution, h(n). 
The distribution function for the sum of the weights is then 


F'(y) = =h(n)f(y, n) 
where f(y, ”) is the distribution function for the sum of n independent weights. 
The variance, o, of this distribution is given by the formula, 





2 2 2 

c 1 oo 

m= ,m= ~~ mom?’ 
where m refers to the mean, n refers to the distribution h(n), and 9 refers to the 
weight distribution. Some writers have assumed that statistics of this type are 
distributed as a product. Such an assumption is incorrect and causes an over- 


9 


° ° 2 2 2 
statement of the variance to the amount of ,m-om'-,0 ‘oo. 


7. Application. In Table I is shown the distribution of claims under a cer- 


tain plan of group sickness and accident insurance. The parameters, (4), for 
this distribution are 
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(6) om = 3.62, ww =81, 08:=14, (62 — 8) = 15. 


This distribution is in terms of weeks per claim. The insurance company is 
interested in the financial cost per claim. A study shows that the distribution 
of the rate of weekly indemnity to which different classes of employees are 
entitled has the average parameters, 


(7) ym = 15.25, i = 16.5, 161 = 20, 1(@2 — 3) = 25. 


Since the moment about zero of the product of independent statistics is equal 
to the product of the moments, it is permissible to multiply together the corre- 





TABLE I 
Nearest Duration of Claim in Weeks | ee. 
0 | 197 
1 418 
2 173 
3 109 
4 84 
oD 58 
6 45 
7 35 
8 27 
9 | 24 
10 | 20 
1] | 17 
12 | 14 
13 128 








sponding parameters of (6) and (7) to obtain the average parameters for the 
distribution of the financial cost per claim. These are 


om = 55.2, 2 = 134, 261 = 280, 2(@ — 3) = 375. 


In order to study the distribution of cost under a group of policies for each of 
which $180 in claims is expected, we apply equations (5) to obtain the pa- 
rameters, 


(8) o/m = .74, B=16, =f—3 = 2.1. 
Since the expected number of claims is 
P = 180/55.2 = 3.3 


the probability that there will not be any claims under a policy is 


h(0) = > (3.3)"e** = .037. 











iy —_— We 
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Adjusting the parameters, (8), to remove the zero claims and choosing the scale 
so as to express the results as loss ratios gives the parameters, 


m = 61.6%, o = 52.8%, Bi = 1.57, Be = 4.90. 


A Pearson Type I curve fitted to these parameters intersects the axis well below 
the zero point. Therefore 62 was reduced to 4.59 which gives the expected 
distribution shown in Table IT. 


Table II also shows the actual distribution of loss ratios experienced by one 
of the larger group insurance carriers under policies in this class. The Chi- 
TABLE II 
Experience under Group Sickness and Accident Insurance Policies 


Number of Policies 
Ratio of Losses to Premiums 








| Expected Actual 

0 | 18 | 11 
.O1- .09 | 47 37 
.10- .19 | 53 | 45 
.20- .29 | 50 | 56 
.30- .39 45 38 
.40- .49 | 41 | 47 
.50- .59 | 36 | 39 
.60- .69 | 32 41 
.70- .79 | 28 37 
.80- .89 | 24 20 
.90- .99 | 21 29 
1.00-1.19 | 32 30 
1.20-1.39 | 23 22 
1.40-1.59 17 22 
1.60-1.99 19 14 


2.00 and over 11 9 


square test for goodness of fit gives, 
x = 23, 14 degrees of freedom, 


which corresponds to a probability of 5 per cent. Thus it is apparent that 
theory and experience are in fair agreement considering that no allowance was 
made for the lack of homogeneity ‘between policies.” (This should not be 
confused with the homogeneity ‘‘within policies” covered in the theory.) 

If the expected number of events is small, especially if the weight distribution 
is irregular or discrete, it is sometimes advisable to use the following method: 

1. Use summation or approximate integration to obtain the distribution, 
f(y, n), of the sum of n independent weights for n = 1, 2, 3, and 4. The 
formula is 
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fly,n+1) = [ sept — x, n) dz. 


2. Determine the generalized Poisson distribution for P, the expected number 
of events, equal to some small number, say }. The formula is 


F(y, P) == = P"e"* fly, n). 
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Fic. 1. Surgical Fee Insurance. ----, Distribution. f(y, n), of the sum of n independent 
claims. —— Distribution, F'(y, P), of the sum of the claims when P claims are expected. 
The average claim is $50. 

Example: If the expected claims under a policy are $100 (P = 2) and if the actual claims 
are $490, the probability of an experience as bad as this occurring because of chance factors 
is 0.1%. 





3. Use summation or approximate integration to obtain F(y, P) for P = 3, 
1, 2, 4, --- by the formula 


Fly, 2P) = | F(z, P)F(y — x, P) de. 
0 


4. If the calculations are carried on from both tails and if the results are 
plotted on probability graph paper, it is often possible to fill in the central sec- 
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tions by interpolation. Such interpolations should be adjusted to reproduce the 


correct mean. This method is illustrated in fig. 1 in the case of surgical fee 
insurance. 


8. Summary. In this paper the Poisson distribution is generalized to allow 
for the assignment of varying weights to events when the number of events 
follows the Poisson law. The ability of the Poisson distribution to handle 
heterogeneous data is preserved in the generalization. An example is given 
showing that the distribution of certain insurance statistics agrees with that pre- 
dicted by the theory. 













THE CONSTRUCTION OF ORTHOGONAL LATIN SQUARES! 


By Henry B. Mann? 
Columbia University 


A Latin square is an arrangement of m variables 2 , 42, +++ , Zm into m rows 
and m columns such that no row and no column contains any of the variables 
twice. Two Latin squares are called orthogonal if when one is superimposed 
upon the other every ordered pair of variables occurs once in the resulting 
square. 

The rows of a Latin square are permutations of the row 71,72, °-+*,2%m. Let 
P; be the permutation which transforms 2) , 2, --- , 2m into the 7th row of the 
Latin square. Then P;P7;' leaves no variable unchanged for i ¥ j. For other- 
wise one column would contain a variable twice. On the other hand each set of 


m permutations P;, P2,--- , Pm such that P;P;" leaves no variable unchanged 
generates a Latin square. We may therefore identify every Latin square with 
a set of m permutations (P:, P2,--- , Pm) such that P;P;" leaves no variable 
unchanged. 


Now let (Pi, P2,---, Pm), (Qi, Qe,-°-:,Qm) be a pair of orthogonal 
Latin squares. We shall show that (Pi'Q:, Pz'Qe, +: : P;,'Qm) is a Latin 
square. P7;'Q; is the transformation which transforms the ith row of 


(P,, P2,---, Pm) into the 7th row of (Qi, Qe, ---,Qm). Since every pair of 

variables occurs exactly once if the second square is imposed upon the first, 
call = ai é » ; 

the square (Py Q,, P2 Q2,---, Pm Qm) contains for every 7 and k a permuta- 


tion which transforms 2; into x,. But then it can not contain two permuta- 
tions which transform 2; into z,. This argument can be reversed and it follows 
that (P:1, Pe,---,Pm) and (Qi, Qe,---,Qm) are orthogonal if and only if 
(Pr'Q:, Pz'Qe, ---, Pn'Qm) is a Latin square. 

Denote now by an m sided square S any set of m_ permutations 
(S,, So, +++, Sm) and by the product SS’ of two squares S and S’ the square 
(B.S; , SS, ++: 7 i &Y Then we can state: Two Latin squares LZ; and Ly» 
are orthogonal if and only if there exists a Latin square Ly such that 


(1) LiL = Le ° 


Now let Li, Lo, ---, L, be a set of r mutually orthogonal Latin squares. 
Then we must have Z;L,;, = L; where Li, is a Latin square if 7 # k. Hence we 
have the theorem ; 

THEOREM 1: The Latin squares Li, Le,---, L, are orthogonal if and only 
if there exist r(r — 1) Latin squares Lix(i ¥ k) such that LiL. = Ly. 

Coro.tiary: If L', L‘ and L'* are Latin squares then L' is orthogonal to L*, 

For instance if L and L’ are Latin squares then L is orthogonal to L’. 


1 Presented to the Mathematical Society October 31st, 1942. After I submitted this 
paper for publication Dr. Edward Fleisher sent me his thesis on Eulerian squares which 
he submitted in 1934 and in which he proved Theorem 3 in a different manner. 

2 Research under a grant in aid of the Carnegie Corporation of New York. 
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If A = (Ai, A2,---,Am) and P is any permutation then we put PA = 
(PA,, PAz,-+:,PAm) and AP = (AiP, A2P,---,AmP). If A is a Latin 
square then also AP and PA are Latin squares. If A is orthogonal to B then 
AP is orthogonal to BQ for any permutations P and Q. For if AC = B then 
AP(P'CQ) = BQ, since the associative law holds for the operations indicated. 
This means that A and B remain orthogonal if we permute the variables in 
both squares in any arbitrary way. 

Hence if A is orthogonal to B also AAj{' is orthogonal to BBy*. We can 
therefore, while preserving orthogonality, always transform the pair A and B 
so that A; = B,; = 1 where 1 denotes the identity. We shall then say that 
the pair A, B is written in the reduced form. 

DerinitIon 1: If A is orthogonal to B, and if in the reduced form the permuta- 
tions of A are the same as those of B in a different order, and if these permutations 
form a group G, then the pair A and B is said to be based on the group G. 

A pair of orthogonal Latin squares is called a Graeco-Latin square. The 
Graeco-Latin squares constructed by Bose [1] Stevens [2] and Fisher and Yates 
[3] are all based on groups. There exist Graeco-Latin squares, however, which 
are not based on a group. 

If the orthogonal pair A, B is based on a group G and if AC = B then also C 
contains only permutations of G, and since C is a Latin square it must contain 
all the permutations of G. Calling C; the image of A; we obtain a biunique 
mapping S of G into itself. Let A? = C; then B; = A;,Aj and S has therefore 
the property that every element of G is of the form XX* where X is in G. 

DEFINITION 2: A biunique mapping S of a group G into itself will be called 
a complete mapping if every element of G can be represented in the form XX* where 
X is an element of G and X* the image of X under the mapping S. 

If an abstract group G of order m admits a complete mapping S then we can 
immediately construct an m sided Graeco-Latin square based on G. To do this 
we represent G as a regular permutation group. Let Pi, P2,---, Pm be the 
permutations of this representation. Then A = (Pi, P2,--:,Pm), C = 
(Pi, P2,---, PS) and B = (PiP}, PoP2,--- , PmP») are Latin squares and 
hence A is orthogonal to B and AP;" and B(P,P})' form a reduced pair. 

If L,, le, ---, L, are orthogonal Latin squares and L;Li. = Ly, then we 
form the product 


(2) Ly Lyle3 +++ Ly-1r . 


From Li Lu = L; ; LjLr j = L; we find LLixlr; = L; and hence Liner; = Li; ° 
Ly. is therefore orthogonal to L;;. The product (2) has the property that for 
any s < r the product of s successive factors is a Latin square. On the other 
hand if a product of r Latin squares Ii, , Lx, --- , L,-1, has this property then 
the Latin squares Li, In, --- , L, where L; = InLyls; --- Lis: are orthogonal. 

DEFINITION 3: A set of r orthogonal Latin squares will be called based on a 
group G if every pair in the set is based on G. 

If Li, le, --- , L, are based on a group G then G must admit r mappings 
S, = 1, S:, ---, S, into itself such that every element of G can be written in 
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the form X*!**!*1*--*S!+* for every cand h with 1 < i< randO <h<r—i, 

where A**™* = A*A*, and A®* is the image of A under the mapping S. 
DEFINITION 4: The mappings S; = 1, S2,---,S, of a group G into itself 


oo tSi+n 


will be called r-fold complete if every element of G is of the form X******!* 
for eeryiandhwithl <igrandOgheCr—i. 
Now let G be an abstract group of order m admitting an r-fold complete set 


of mappings S; = 1, S.,---,S,. Put 
S1 +Sqt...+S; $1 +S—+...+S; $1 +Seq+...+38; 
Zeer fs " ) Pa ) 
where 1, P2, --- , Pm is a regular representation of G. Then Li, Lz, --- , Lr 
is a set of r orthogonal Latin squares based on G. Put A; = 1°'7?* "7"! then 
—1 —! . ° . 
[,A; ,--- , L,A; are written in the reduced form. Hence we have 


THEOREM 2: A set of r orthogonal Latin squares based on a group G exists if 
and only if G admits an r-fold complete set of mappings. 
If G is of order m = 4n + 2 = 2m’ then G has a self-conjugate subgroup 
H of order m’. Suppose G admits a complete mapping S. We have 
G =H + AA. 
XX* CH if either X and X°* or neither of them arein H. Further XX* C HA 
if either X or X* but not both of them are in H. 
Let a be the number of elements X C H such that X* CH, 
b the number of elements X C H such that X* C HA, 
ec the number of elements X C HA such that X* C H, 
then a + b = m’,a+c=~m’. Of the products XX* exactly b + ¢ are in HA. 
Hence b + c = m’, a = b and therefore m’ = 2a, which is impossible since m’ 
is odd. We have therefore: 
THEOREM 3: No 4n + 2 — sided Graeco-Latin square based on a group can 
exist. 
If a group G admits r automorphisms 7; = 1, T2, --- , T, such that X”7' ¥ 
X” for i ¥ j and X ¥ 1 then the mappings S: = 1, S; = X "''X" fori = 
2,3, ---,r are r-fold complete; for if 


rte reese T+. FEGa8 7a Ye reess +.-- RE o8 
a = 


we have for z = 1 
XTith i yrita 
and fori > 1 
XT yy titk — poTi-iypTi+n 
and therefore 
xy = ry 


and hence Y = X in both cases since by hypothesis X7' # X” for i ¥ j and 
X £ 1, X*'*:-**'** therefore takes m different values and reproduces every 
element of G. 

If we represent G as a regular permutation group then the squares Ll, = 
(1, Pe, -:+, Pm), Le = (1, PZ?,---, Pz?), ---, L, = (1, Pt", --+, Pn’) are 
orthogonal Latin squares by Theorems 1 and 2. There exist however complete 
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mappings which are not derivable from automorphisms. For instance every 
group of odd order admits the complete mapping A* = A but A” = A’ is not 
an automorphism if the group is not abelian. 

Most of the sets of orthogonal Latin squares that have been constructed 
so far are based on abelian groups of type (p, p, --- , p) and the mappings of 
the squares of the sets into each other are automorphisms of this group. R. C. 
Bose [1] and W. L. Stevens [2] for instance use the cyclic group of automorphisms 
of the additive group of a G. F. (p") induced through multiplication by the 
elements of the Galois field that are different from 0. In this way they assure 
that different automorphisms will map the same element into different elements. 
They give a convenient method for finding a base element of the group of auto- 
morphisms. In this way they reduce considerably the labor involved in the 
construction of p” — 1 orthogonal Latin squares of side p". The 9 x 9 squares 
in the statistical tables by Fisher and Yates [3] are also based on the abelian 
group of type (3,3) but another set of automorphisms is used. 

If m = pi' ps --- px" (pi prime p; ¥ p, fori ¥ k) and if r = min p;' — 1 
then a set of r orthogonal Latin squares can always be constructed from the 
abelian group of type (pi --+ pi, Po++* Po, °**, Pn ,***, Pn) and its auto- 
morphisms. This can be done by finding r automorphisms T}”, 73”, --- , T° 
for each of the subgroups of order p‘‘ such that Tf’ 7S leaves no element un- 
changed except 1. If we apply the automorphisms TS”, 7%, ---, 7$” simul- 
taneously, for 7 = 1, 2, ---, 7, we obtain r automorphisms of the desired type. 

Once the automorphisms are known the construction of the set of orthogonal 
Latin squares can easily be carried out. To do this we have to write down the 
multiplication table of the group and obtain the orthogonal squares by inter- 
changing the rows in accord with the automorphisms. 

DEFINITION 5: A set of orthogonal Latin squares derived from a group and tts 
automorphisms will be called constructed by the automorphism method. 

We now prove: 

THEOREM 4: Let cq be the number of classes of elements of order q of a group G. 
Let s = min ¢, ; then not more than s orthogonal Latin squares can be constructed 
from G by the automorphism method. 

Proor: Let 7 be an automorphism which leaves no element unchanged 
except 1. If A is of order g then A’ is also of order g. If A” = P'AP then 
there exists an element Q such that P = Q-'Q” because, as we have shown, every 
element can be represented in the form XX’. But then 


(QAQ")’ = QPP'APP'Q = QAQ". 


Hence A = 1. T can therefore not transform any element except 1 into an 
element of the same class. Hence not more than s = min c, automorphisms, 
T:,-°-:, 7; can exist such that 77'T; leave no element except 1 fixed and this 
proves our theorem. 

Coro.iary: If m = pi'p? --- pit(pi prime p; ¥ pr for j # k) then not more 
than r = min p;' — 1 orthogonal m-sided Latin squares can be constructed from 
any group with the automorphism method. 
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Ly 
Lye 
Le 


Ly 

Lie 

Lz 
1 2,2 
8 1,7 
10 4,9 
11 3,12 
9 6,10 
4 5,3 
6 8,5 
7 Ys 
5 10,6 
12 9,11 
2 12,1 
3 11,4 


11 2,2 
2,4 1,3 
3,2 4,1 
43 8 
5,9 7,12 
6,12 8,9 
7,10 5,11 
8,11 6,10 
9,5 12,7 
10,7. 11,5 
11,8 10,6 
12,6 9,8 
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If the basis 


(1, P, R, PR, Q, PQ, RQ, PRQ, Q, PQ’, RQ’, PRQ) 
(1, RQ, PR, PQ’, Q, RQ’, PR, P, QW, R, PRQ, PQ) 
(1, PRQ, PQ’, RQ’, Q, PR, PQ, RQ, Q, PRQ, P, R). 


td (1, r, R, PR, Q, PQ, RQ, PRQ, Q, PQ’, RQ’, PRQ’) 


(1, R, PR, P, Q, PQ, RQ, PRQ, Q, PQ’, RQ’, PRQ) 
(1, PR, P, R, Q, PRQ, PQ’, RQ’, Q, RQ, PRQ, PQ). 
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EXAMPLE 1 


6,6 7,7 88 9,9 
5,11 8,10 7,9 10,4 
81 5,4 63 11,6 
7,4 6,1 5,2 12,7 
10,2 11,3 12,4 1,5 
9,7 12,6 11,5 2,12 
12,9 9,12 10,11 3,2 
11,12 10,9 9,10 4,3 
2,10 3,11 4,12 5,1 
13 42 3,1 6,8 
45 1,8 2,7 7,10 
3,8 2,5 1,6 811 
EXAMPLE 2 





6,6 7,7 8,8 ; 
5,7 8,6 7,5 10,12 
85 5,8 6,7 11,10 
"78 6,5 5,6 12,11 
11,4 12,2 10,3 1,5 
121 11,3 9,2 2,8 
93 10,1 12,4 3,6 
10.2 9,4 11,1 4,7 
4,11 2,12 3,10 5,1 
3.9 1,10 4,12 6,3 
210 4,9 1,11 7,4 
112 3,11 2,9 82 
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Proor: The Sylow group of order p;{‘ contains a representative of every class 
of elements of order p; hence min c, < min p;* — 1. 
Below are given two examples of Graeco-Latin squares obtained from com- 
plete mappings which are not obtained from automorphisms. 
have been obtained by combining Graeco-Latin squares constructed by the 
method of Bose [1] and Stevens [2]. 
The first example is based on the abelian group of type (2,2,3). 
elements are defined by P” = R’ = Q* = 1 the complete mapping used is given by 


Neither could 


The second square is based on the regular representation of the A, the alter- 
nating group in 4 variables. The generating relations are P> = R’ = Q* = 1, 
QP = RQ, QR = PRQ. The complete mapping is given by 
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A METHOD OF DETERMINING EXPLICITLY THE COEFFICIENTS 
OF THE CHARACTERISTIC EQUATION 


By P. A. SAMUELSON 


Massachusetts Institute of Technology 


1. Introduction. When an investigator is interested in all of the latent roots 
of the characteristic equation of a matrix and not in its latent vectors, it is 
sometimes desirable to expand out the determinental equation in order to de- 
termine explicitly the polynomial coefficients (p: , ps, +--+ , Pn) in the expression 
(1) D(d) = | A —a| = rx" + pr" + ees + prad + Dn. 

This can be done in a variety of ways, all of which are necessarily somewhat 
tedious for high order matrices. Except for sign the coefficients are respectively 
the sum of a’s principal minors of a given order. These can be computed 
efficiently by “pivotal”? methods [1]. Alternatively through the utilization of 
the Cayley-Hamilton theorem, whereby a matrix satisfies its own characteristic 
equation, the p’s appear as the solution of n linear equations [2, 3]. In a third 
method Horst has employed Newton’s formula concerning the powers of roots 
to derive the p’s as the solution of a triangular set of equations, the coefficients 
of the latter only being attained after considerable matrix multiplication [4]. 
A fourth method suggested to me by Professor E. Bright Wilson, Jr. of Harvard 
University, consists of evaluating D(A) for n values of A, presumably by efficient 
‘Doolittle’ methods; to these n points, Lagrange’s interpolation formula is 
applied to determine the 7 coefficients explicitly. 

























2. The New Method. The present paper describes a new computational 
method based upon well-known dynamical considerations. <A single nth order 
differential equation can be converted into ‘‘normal”’ form, involving n first order 
differential equations. This is easily done by defining appropriate new variables. 
If the original nth order differential equation is written as 


(2) XO) + mXO OO + +++ + paaX'(t) + pn = 0, 
then the new normal system can be written as 
(3) Xi) = Lbs XO, G=1, +++ 2) 
1 
where 


oeeeee eee eee eee eee eee ee eee eee 


(4) 








<9 Pn — Pn-1 
is the so-called companion matrix to the polynomial in question. 
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The reverse process of going from a normal system in many variables to a 
single high order equation is not so simple. Yet it can be done, and in so doing 
we attain the required polynomial coefficients [5]. If 


(5) x(t) = ax(t) 


represents the normal system in matrix form, then symbolically 
‘i d r r(n 7 (n— ! ry 
(6) D(4) Xi) = Xi) + XPM) + ++ + pea Xi) + pn. 


Because we wish to find out the expanded form of D(A), this relationship is of 
no use to us. Since similar matrices have the same characteristic equation, 
ours is the problem of finding a non-singular matrix C, such that 


(7) Cc aC = b, 


where 6 is of the form given in equation (4). 

This problem can be approached from an elementary algebraic viewpoint. 
The relationships in (5) represent n linear equations between 2n variables, 
[Xi(é), X2(t), --- , X,(t), Xi(0), X2(t), --- , XO]. These are not sufficient to 
eliminate the 2(n — 1) variables not involving the subscript 1. However, inas- 
much as (5) holds for all values of ¢ we may differentiate it repeatedly until we 
finally have the system of equations 


x(n) 7(n—1 7(n—1) 
one LY otees + ane 1 ob eee Hain Xs = 0 
y(n) 7(n-1) 7(n—1) 
—X . + Qni Xy" + pane + Ann X . — 0 
7(n—1) 7(n- 2) >(n—2) 
— X;' + + an X," + +++ + aun X, = 0 
ao 0 cdangane eae. ere a 
7(n—1) y(n—2) 7 (n—2) 
—X “4 + Qni Xj. + saat +T Ann X . = 0 


—Xit-:-+anXi¢ +: +anX, =0 


SOC CCHHEHO DEED E ED TDHTEHKHED KEE EHH H ODETTE CEOS 


yl + ~ 
—Xn t+ On Xi t+ -°* + Ann Xn = 0 


These are n° linear equations in n> + n variables. We wish to eliminate all 
variables which have a subscript other than one; namely, (X2,°---,X,, 
Xo,°::,Xn,°°*, Xs”, «++ XS”). These are (n + 1)(n — 1) = n® — 1 in 
number. We may utilize all but one of the n’ equations to perform this elimina- 
tion. The remaining equation after substitution will be the desired high order 
equation, and its coefficients are the polynomial coefficients. 

Ordinarily one would solve all but one of the equations for the values of the 
variables to be eliminated. These would then be substituted into the remaining 
equation. Actually from the computational standpoint it is unnecessary to 
solve completely for any unknowns. The so-called ‘‘forward’’ solution of the 
usual Gauss-Doolittle technique automatically performs the elimination or 
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substitution, without necessary recourse to a ‘‘back”’ solution for the values of 
the eliminated variables. These values are in any case of no interest. 

There is no unique order in which the equations must be reduced. Indeed, 
when one order fails because a leading principal minor vanishes, we may switch 
to another. A suggested convenient order is given below. Let 

















* Ann 


Q21 ign °° * Aon ay | R . 

. . ° = | ; I = (6), (i,j = 1,-->,"2— 1) 
: . 7 S; M 

Ani An2 °° 


Then, consider the partitioned matrix 


SCSESECHESESAHBTACECEHHEHCSEECSECHEE | CHOOT EHEHRHOCD OOH HHH ODDO 


CREHPHKHCHOHHSCEH SH SSC SESCESCHESE | CHS HVSHHSSHSHSSCEHSEC CSC HEB HEEES 








It is simply the matrix of the equations in (8) with the variables 
(X,, Xi, ---, X{”) shifted over to the right-hand side, and with the equations 
in which the variable one leads off being placed at the bottom. 

If the usual “forward” Doolittle technique is followed, then the final elements 
computed, corresponding to the elements in the lower right-hand box, are the 
coefficients (1, p1, po, °°: , Pn). It is the present writer’s experience that the 
Crout form [6], like Dwyer’s [7] the last word in Doolittle abbreviation, is to be 
recommended, particularly since we are dealing with an asymmetrical matrix. 
A clerk masters its ritual in a few minutes, and the speeds achieved once the 
operations become mechanical are impressive. 

For the trivial case of determining the coefficients corresponding to a two by 
two matrix the W matrix is of the form 


lo oe @©\6¢ ~<a 6 


0 —| Age | 0 0 —d21 
(10) 0 °O ay 0 1 —ai1 
| 0 ay 0 1 —Gn 0 


The Auxiliary Crout matrix becomes 













—1 Ar 00 — A201 0 
0 —1] Axe 0 0 — Aa 
(11) 0 0 apri0 1 _" 
0 Ay Ax\| 1 (—an — dx) (—@id21 + A142) 
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The answer in the lower right-hand box will immediately be recognized as the 
correct one. I have found it convenient to vary the precise Crout routine by 
dividing vertical columns by the “leading” diagonal element, rather than 
horizontal columns. This is a matter of indifference and saves some computa- 
tions. As in the higher order cases, the presence of the identity matrix along 
the diagonal reduces most of the computations to mere copying. Actually the 
intelligent computer will soon notice that most of the copying may be eliminated 
since the numbers in question are to be added in later in other sums of products. 
After eliminating unknowns corresponding to the equations above the line on 
which (9) is written, there results the system 


R | 0 O 0 0 a 1 —@n 

RM 0 O 0 0 sos @ 1 —~@., ~-RS 

RM |0 0O 0 0 “++ 1 -a, —-RS —RMS 
(12) 

RM*'\1 —en —RS -—-RMS .«-- ves ++ =—RM**S 


Thus, it would be simpler to start from this stage, avoiding unnecessary copying. 
This remark shows that the present method is related to the Cayley-Hamilton 
methods described in [2] and [3], since the above set is derivable from the set 


e A°|1 00 --- O 
e +168 6 + © 
eae A’? 10 0 1 0 | 


(13) 


/ 


e A" 00 0 :-- 1 


The last named set appears in the Cayley-Hamilton method when the first row 
of the powers of the original matrix are used in setting up n equations to deter- 
mine our n unknowns. Although related, the two methods are distinct since 
in the Cayley-Hamilton method one would arrive at a different set of equations 
after straightforward elimination of one variable, and since it would be shorter 
to dispense with the identity matrix used in the Aitken method in favor of the 
solution of a single set of equations by the usual Doolittle ‘“back-solution.” 

The reader will easily see how the method may be modified to handle the more 
general case of determining the coefficients of 


(14) Dir) = |e + al = 0, 


where c and a are any matrices. The method also can be used to reduce a 
polynomial equation involving a determinant of the nth order, each of whose 
coefficients are of a given degree in \, to a lower order determinant whose coef- 
ficients are of higher degree in . 
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The present method derives the p’s as the algebraic solution of high order 
linear equations. It would therefore seem inferior to those methods which need 
only solve a system of n equations. However, two remarks are in order. The 
matrix of the high order system can be written down immediately without 
computation. Furthermore, most of the elements in the matrix are zeros, so 
that a mere counting of the equations is not a true indication of the labor in- 
volved. 


3. Some comparsions between present method and other methods. Within 
the brief compass of the present work it is not possible to give an exhaustive 
appraisal of the comparative computational efficiencies of the methods men- 
tioned. In general, a computing method is to be judged in terms of the number 
of multiplications that it involves, although other considerations such as the 
number of additions, the magnitude and sign of the numbers handled, the 
repetitiveness of the operations involved, the adaptability to punch ecard ma- 
chinery, ete. are modifying factors. In this discussion the power of a method 
will be taken to be an inverse function of the number of multiplications that it 
involves. 

It may be said first of all that inasmuch as the minimum number of multi- 
plications involved in computing an nth order determinant is of the order of 
n°, even with the most efficient “pivotal” methods, direct computation of the 
coefficients by principal minors involves, for sufficiently large n, computation 
of the order of n*. The same is true of the Wilson method described above. 
The Horst method, and any other that requires the explicit n powers of an nth 
order matrix, also asymptotically requires multiplications of the order of n’, 
This does not mean that the above three methods are equally powerful for small 
n, nor even asymptotically, since the coefficients of the n‘ term in the formula 
for the requisite number of multiplications may not be equal. In fact, Riersol 
[1] has shown that his method is better than Horst’s for small n, but asympto- 
tically less powerful. 

It can also be shown that the Cayley-Hamilton methods which simply involve 
products of the powers of a matrix with row or column vectors are asymptotically 
more powerful than any of the above methods, the work only increasing as the 
cube of x. This is true whether the longer Aitken form of reduction is em- 
ployed or whether the usual Doolittle back-solution is followed. The present 
method is also an efficient one in the sense that its requisite number of multi- 
plications increases with the cube of n. For small values of n and asymptotically 
it can be shown to be more powerful than the Cayley-Hamilton method which 
uses the Aitken method of reduction, although in the limit as n becomes large 
the ratio of the powers of the two methods approaches unity. 

It is of the greatest interest to compare the power of the new method with the 
shorter Doolittle C-H method. It can easily be shown that the coefficients of n° in 
the expressions giving the respective requisite number of multiplications differ 
in such a way as to make the C-H method more powerful after some value of n, 
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the ratio of the respective powers approaching the limit 8/9. However, for 
low order matrices the new method is the more powerful. The reader may 
easily verify this for the case of a second order matrix. Below a sixth order 
matrix the present method seems to involve the smaller number of multiplica- 
tions. Fora sixth order matrix the two methods seem to involve the same num- 
ber of multiplications (multiplications by unity not being counted). For 
matrices of the seventh order or higher the C-H method seems to be optimal. 

As compared to an explicit evaluation of the coefficients by a straightforward 
computation of principal minors according to the fundamental definition of a 
determinant as the sum of signed products of elements, all of the methods 
discussed are efficient, since the work in the former increases faster than any 
power of n. However, for each of the methods discussed, in singular cases the 
method of reduction may fail so that modified procedures will be necessary. In 
actual practice such singularities will ‘‘almost never’? be encountered. But in 
the neighborhood of such singular points the computations become extremely 
sensitive to any rounding off of digits. Consequently, it is from the nature of 
the case impossible ever to develop exact rules for the maximum error involved 
in any given calculation. 
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This section is devoted to brief research and expository articles, notes on methodology 
and other short items. 
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A NOTE ON THE THEORY OF MOMENT GENERATING FUNCTIONS 
® 

By J. H. Curtiss 

Cornell University 


Let X be a one-dimensional variate and let F(x) be its distribution function.’ 
The function 


G(a) = E(e**) = [ e* diF'(x), a real, 


in which the integral is assumed to converge for a in some neighborhood of the 
origin, is called the moment generating function of X. In dealing with certain 
distribution problems, this function has been widely used by statisticians, and 
especially by the English writers, in place of the closely-related characteristic 
function f(t) = E(e'*). It is known that a characteristic function uniquely 
determines the corresponding distribution, and that if a sequence of character- 
istic functions approaches a limit, the corresponding sequence of distribution 
functions does likewise. (These results are more accurately stated below.) The 
appropriate analogues for the moment generating function of these theorems are 
apparently not too readily accessible in the literature, if they have been treated 
at all, and it seems worthwhile to record them in this note. 

Henceforth we abbreviate distribution function to d.f., moment generating 
function to m.g.f., and characteristic function to c.f. The variables a and ¢ will 
always be real, in contradistinction to the complex variable s, to be introduced 
in the next paragraph. 

The uniqueness property of the c.f. may be stated as follows: If Fi(v) and 
fi(t) are the d.f. and c.f. of one variate, and F(x) and fo(t) are those of another, 
and if fi(t) = fo(t) for all’ ¢, then Fi(x) = F.(x) for all x [1, p. 28]. To study the 
corresponding situation for the m.g.f., we first observe that 


+20 


y(s) = -E(e*) = | e” dF(2x), s complex, 


— CO 








1Or cumulative frequency function; our notation and terminology are uniform with 
that of [1] except for the use of the term ‘‘variate”’ instead of ‘‘random variable.”’ 

2It is possible for two non-identical distributions to have c.f.’s which are identical 
throughout an interval of values of ¢ containing the origin; an example is given in [4], p. 190. 
The author is obliged to Professor Wintner and Professor Feller for pointing out the exist- 
ence of this particular example. 
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is a bilateral Laplace-Stieltjes transform. If such a transform exists for real 
values of s in an interval —a, < s < a, a > O, it must exist for all complex 
values of s in the strip —a; < Rs < a, and represent there an analytic func- 
tion of s [5, p. 238]. Evidently g(a) = G(a), git) = f(t). Suppose now that 
F,(x), Gila), fi(t), are the d.f., m.g.f., and c.f. of a variate X; , and F2(x), Go(a), 
fo(t), are those of X2. Let gi(s) = E(e’™), g2(s) = E(e***), s complex. If 
Gi(a) = G2(a) for all a in some interval, however small, containing the origin, 
then by a familiar property of analytic functions [2, p. 116], gi(s) = ¢go(s) 
throughout the corresponding strip of analyticity, and so on the axis of imagi- 
naries. This means that fi(t) = fo(t), all t, and therefore Fi(x7) = F.(x). We 
have: 

THEOREM 1. A m.g.f. existing in some neighborhood of a = 0 uniquely deter- 
mines the corresponding distribution. 

We turn now to distributions of variable form. Because certain of the ver- 
sions to be found in the literature are incomplete, it seems worth while to give 
here a full statement of the basic limit theorem for sequences of c.f.’s, due to 
P. Lévy and sometimes called Lévy’s Continuity Theorem [4, pp. 48-50]. 

THEOREM 2. Let the distribution of a variate X,, depend on a parameter n, and 
let F(x) and f,,(t) be the d.f. and c.f. of X,. 

(a) If there exists a variate X with d.f. F(x) such that lim,.,, F,(x) = F(x) at 
every continuity point of F(x), then lim, fn(t) = f(t) uniformly in each finite 
interval on the t-axis, where f(t) ts the c.f. of X. 

(b) If there exists a function f(t) such that lim,..,,f,(t) = f(t), all t,? and uni- 
formly’ in some open interval containing the origin, then there exists a variate X 
with d.f. F(x) such that lim, F',(x) = F(x) at each continuity point and uniformly 
in any finite or infinite interval of continuity of F(x). The c.f. of X is f(t), and 
iM sec fn(t) = f(t) uniformly in each finite interval. 

We now develop the corresponding theorem for the m.g.f. In the first place, 
it is not difficult to see that part (a) will have no direct analogue, even if we add 
to the hypothesis the conditions that the m.g.f. of X,, exists in some fixed interval 
for all n and that the m.g.f. of X also exists in some interval. For example, 
the d.f. 


(0,2 < —n 
F(z) = 13 + k, are tan nz, 
la, 2 


3 The condition that lim,_,,,f,(t) exist on at least an everywhere dense set of points on the 
t-axis is essential to the proof as given in Cramer’s book [1, pp. 29-30], but is omitted in his 
statement of the theorem, and is not stated clearly in certain other treatments by other 
authors. 

‘ For a discussion of this uniformity condition, and possible alternatives, see [l, p. 29 
(footnote)]. The condition may, for instance, be replaced by the assumption that f(t) 
is continuous at ¢ = 0. 
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where k, = 1/(2 are tan n’), clearly tends as n > ~ to the df. 
0,7 <0 


Lz20 


F(x) = 










at all points of continuity of the latter df. 
F,,(x) is 


The m.g.f. corresponding to 






n 


Gila) = | ket? —— 


—— ax 
- 1 + n222 ~’ 






which for each n exists for all a, and the m.g.f. corresponding to F(x) is simply 
the constant 1. Clearly 












n 3 3 
x n 
oto) > [mle 8 a 
in(a) 0 3! dl + ne?” 
and from this it can easily be verified that lim,.,G,(a) = « ifa #0. In 


short, mere convergence of a sequence of d.f.’s tells little about the behavior of 
the corresponding sequence of m.g.f.’s. 

Part (b) assumes the following form: 

THEOREM 3. Let F(x) and G,(a) be respectively the d.f. and m.g.f. of a vari- 
ate X,. If G,(a) exists for |a| < a and for all n = nm, and if there exists a 
finite-valued function G(a) defined for |a| S az < a, a2 > O, such that lim,., 
G,(a) = G(a), |a@| S ae, then there exists a variate X with df. F(x) such that 
lim,.,. F(z) = F(x) at each continuity point and uniformly in each finite or 
infinite interval of continuity of F(x). The mg.f. of X exists for |a| S a. and 
is equal to G(a) in that interval. 

To prove the theorem, we introduce the Laplace transform ¢,(s) = E(e°*") 
and observe that | ¢,(s) | S ¢gn(a) = G,(a), s = a + it, n = nm, for any s in 
the strip —a1 < Rs <a. By applying Leibniz’s rule for differentiation under 
an integral sign (extended to Stieltjes integrals), we find [5, p. 240] that 










+o 


G..(a) = [ ve” dF,(x), la| < a, 
20 

from which it appears that G.(a) > 0, |a|<a. This means that the function 
G,(a) assumes its maximum value in the interval |a@| S a at either or both 
endpoints of the interval. But of course G,(a2) and G,(—a2) both approach 
finite limits as n becomes infinite, so it follows that the sequence {G,(a)}, 
n = mo, is uniformly bounded in the interval |a@| S a2. Thus the sequence 
{| on(s) |}, m 2 mo, is uniformly bounded in the strip —az S Ks S ay, and 
moreover has a limit at each point of an infinite set possessing a limit point in 
the strip (i.e., at each point of the interval —a. S s S a). So by Vitali’s 
Theorem [3, pp. 156-160, 240], there exists an analytie function ¢*(s) such that 
lim,.,, ¢n(S) = ¢*(s) uniformly in each bounded closed subregion of the strip 
—ar < Rs < a. Since ¢,(7t) is the c.f. of X, , the existence of the limiting 
distribution follows from Theorem 2(b). 
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Of course, g*(a) = Gla), —a. < a < a. It remains to show that ¢*(a) 
is the m.g.f. of X¥. Theorem 2(b) states that ¢*(7t) is the c.f. of X. If we can 
show that the function g(s) = E(e**) exists at least in the strip —a. < Rs < az, 
then since g(s) = ¢*(s) on the axis of imaginaries, the equality must be valid 
in the entire strip, and so in particular on the interval of the real axis inside 
the strip. 

It will suffice for this purpose to show that g(a) exists for —az S a S ae. 
Suppose indeed that g(a) does not exist at some point a = a; in this interval. 
That means that if 

M = [l.u.b. G,(a3), n = nol, 


we can find a real number A such that 


A 
(1) | e** dF(x) > M. 
i 


But 
a A . 
[ ews? dF(x) sa [ em? dF,,(x) + il ews dF (x) ae [ emt arte) | 
feo ll — A —A ~_ 


Since lim,.,, F.(@) = F(x) at all continuity points of F(x), and so on an every- 
where dense set of points, the Helly-Bray Theorem [5, p. 31] states that the 
expression in brackets in (2) approaches zero as n becomes infinite. Meanwhile 


A +00 

[ e**? dF, (x) S [ e** dF,(xz) = M, nz Mm. 
j— A — 3 

Thus we arrive at the conclusion that the left member of (2) must be less than 

or equal to M, which contradicts (1). 

To be sure, we have only proved that the m.g.f. of X is equal to g*(a) or G(a) 
in the open interval —a. < a < a2, and not in the corresponding closed interval, 
as promised. But because of the absolute (and therefore uniform) convergence 
of the integrals defining G,(a@) and g(a), these functions must be continuous in 
the closed interval —az S a S a. Since lim,., Gn(@) = G(a) uniformly in 
this interval, G(a) must also be continuous there. This implies that ¢(a@), the 
m.g.f. of X, is identically equal to G(a) in the closed interval, and the proof is 
complete. 

It is perhaps worth while to point out explicitly that in the course of the 
foregoing argument we have proved this proposition: 

THEOREM 4. If a sequence of m.g.f.’s converges in an open interval containing 


a = 0, then it must converge uniformly in every closed subinterval of the open 
interval, and the limit function is itself a m.gf. 
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ON THE POWER FUNCTION OF THE ANALYSIS OF VARIANCE TEST! 
By ABRAHAM WALD 


Columbia University 


It is known’ that the general problem of the analysis of variance can be re- 
duced by an orthogonal transformation to the following canonical form: Let the 


variates ¥1,°°*,Yp, 21, °°*,2n be independently and normally distributed 
with a common unknown variance o. The mean values of 2, , --- , 2, are known 
to be zero, and the mean values m,---, 7, of the variates y,,--- ,¥Yp are 


unknown. The canonical form of the analysis of variance test is the test of the 
hypothesis that 


(1) m=m=-:':=7=0 (r < p) 


where a single observation is made on each of the variates y,,---,¥yp, 
a a 

In the theory of the analysis of variance the test of the hypothesis (1) is 
based on the critical region 


a was P 
5 it ty 
2 + oe + en 
where the constant c is chosen so that the size of the critical region is equal to 


the level of significance a we wish to have. The critical region (2) is identical 
with the critical region 


2e€ 


’ c 


(3) Ae: | a 
yites ty tate +4 c+ 1 




















It is known that the power function of the critical region (3) depends only on 
the single parameter 


(4) 


Denote the power function of the critical region (3) by (A). P. L. Hsu has 
3 . . ‘ ‘ . - — 

proved’ the following optimum property of the region (3): Let W be a critical 
region which satisfies the following.two conditions: 

(a) The size of W is equal to the size of the region (3). 

1 Presented at a joint meeting of the Institute of Mathematical Statistics and the Ameri- 
can Mathematical Society in New York, December, 1941. 

2 See for instance P. C. Tana, ‘‘The power function of the analysis of variance tests,’’ 
Stat. Res. Mem., Vol. 2, 1938. 

3P. L. Hsu, ‘‘Analysis of variance from the power function standpoint,’’ Biometrika, 
January, 1941. 
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(b) The power function of W depends on the single parameter X. 
Then B(A) < Bold) where B(A) denotes the power function of W. 

Condition (b) is a serious restriction in Hsu’s result. In this paper we shall 
prove an optimum property of Bo(A) where Bo(A) is compared with the power 
function of any other critical region of size equal to that of (3). 

For any given values 7741, ---, ny ,o’ and \ denote by S(n4i,°°°, np, 9’, A) 
the sphere defined by the equations 


(5) mts; + =do"?> m=ni=rtl,---,p); =o’. 


For any region W denote by Bw(m, --- , np, ¢) the power function of W,i.e. 
Bw(m,-***,»,o¢) denotes the probability that the sample point will fall 
within W calculated under the assumption that m , --- , 7» and o are the true 
values of the parameters. We will denote by yw(nrui, °°°, n> , 0, A) the in- 
tegral of the power function Bw(m,---, Np ,o) over the surface 
S(m41, ++, 2», 0’, d) divided by the area of S(ntsi, -*: , 1», 0’, A), ie. 


, , 
yw(nr41 i. oe a’, d) 


(6) e ' r , 
i dA Bw(m 7° °° > Mes o’) dA. 
S(np+isompso’ sd) S(mp tiers smpeo’ sd) 


We will prove the following 
TurorEM: Jf W is a critical region of size equal to that of (3), i.e. 
Bw(0, eee: Q, Nril, °°° » Mp, a) = Bo(0), then 


(7) yw(nrs 9 Np ’ q’, d) < Bo(A) 


for arbitrary values Cite tt. n> ,o’ and x. 

If W satisfies Hsu’s condition (b) then the power function Byw(m, --+ , np, ¢) 
is constant on the surface S(m41,°:°:,mp, o, A) and _ therefore 
vw(Mr41, *** » Mp,» 9, A) = Bw(m,-*:, m,¢). Hence Hsu’s result is an imme- 
diate consequence of our Theorem. 

Denote | Vy +.--+ y + 2 +---+ 2| by ¢ and for any values 
G41, °** , Ap, b let R(a,41, «++ , ap, b) be the set of all sample points for which 


y,=a(ti=rt+1,---,p) and t=b. 


For any region W of the sample space we denote by W(y,41, «++ , yp, ¢) the 
common part of W and R(y,41, --* , Yp, b). 

In order to prove our Theorem we first show the validity of the following 

Lemma 1: For any critical region Z there exists a function g2(Yr41, -** , Yp, b) 
of the variables y,41, °** , Yp, t such that the critical region Z* defined by the in- 
equality 


yi yo: + y; = gz(Yr4i , Bete »¥p, t) 
satisfies the following two conditions: 


(a) B2(0, --- , 0, Nr+1 5 ‘++ Mp, 9) = B2-(0,--- , 0, Oats *** 5 See OD 
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(b) v2(nr41 9 °°*° 5 Mp, 9; ) < Yz*( M41 »>"** > Np» DG, d). 

Proor: Denote by P2(y-41,-°-:,¥Yp,¢) the conditional probability of 
Z(Yr41, °** ,»Yp,t) calculated under the condition that the sample point lies 
in R(y-41,-°* ,Yp,t) and under the assumption that m = --- = 7, = 0. 


Denote by F(d, t) the conditional probability that 

ite teed 
calculated under the condition that the sample point lies in R(y,41, --- , yp, a) 
and under the assumption that 7 = --- = 7, = 0. It is easy to verify that the 
values of F(d, t) and Pz(y,41, --- ,Yp,¢) do not depend on the unknown 
parameters 7,41, °°*, p,¢-. Since F(d, t) is a continuous function of d and 
since F(t’, 1) = 0, there exists a function gz(y,41, --* , Yp, ¢) such that 

Flez(yr+ ee ses t), t] = P2(Yr4t O°? 4 Bp t). 


For this function gz(y,41, ++ , Yp, t) the region Z* certainly satisfies condition 
(a) of Lemma 1. We will show that condition (b) is also satisfied. Consider 
the ratio 


Pp n 
[ exp| - : - 7 (y; -— ny — a z 2 | dA 
S(mp+is' + sMpyod) 2e* jai 20° 


a=1 
1 - 9 es 9 ~ 2 
wa LEE 2 ~h eZee 
(8) x] ori( 24 + 2 i — a + 2 ) 
D vinglo? 
ag? / c” i, 
S(nptiet* Mp) 





| ? 
Denote 2. by r,. Then we have 
Yi J "y 
i=1 


. 
D> vini/o? 


i=l d ry cos / 
(9) | e dA _ | ev ry cos [a(n) ]/e dA, 
S(nptis' t+ MprsFr) (nr +1 9°** Mp A) 


where a(n) denotes the angle (0 < a(n) < 7m) between the vector y with the 
components yi, °-: , y, and the vector 7 with the components m, ---, 7. 
Because of the symmetry of the sphere, the value of the right hand side of (9) 
is not changed if we substitute 6(n) for a(n) where (mn) denotes the 
angle (0 < B(n) < 7) between the vector 7 and an arbitrarily chosen fixed vector 
u. Hence the value of the right hand side of (9) depends only on 7, , i.e. 

Ae ry cos [a (n) }/e dA 


S(art+is° . * Mp) 


(10) Vi 
d ry cos [8()1/ 
=| ‘ Ty co n * a. = I(r,). 
S(nr+ is? Mp r) 


Now we will show that J(r,) is a monotonically increasing function of r,. We 
have 
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] ~ /\ \ Ty COS ’ o 
(1) dI(r,) _ VA cos [8(n)] a a dA. 


dry, o S(np ties *oMproA) 
Denote by a; the subset of S(m-41, ++, mp, ¢, 4) in which O < B(n) < 5 and by 


w: the subset in which < B(n) < wm. Because of the symmetry of the sphere 


we obviously have 


| cos [B(n)Jev* cos [B(n)]}/o dA = / cos [3 _ B(n)leW* ry cos [x—B(n)]/o dA 
w2 @1 


(12) ii 
aa | cos [8(n)leV* ry cos [B(n)]/o dA. 


Hence 


dI(ry) 


dr, 


(13) 


= VX | cos [B(n) {eV cos [B(n)]/o e -/X ry cos ns dA. 
o Ju, 


The right hand side of (13) is positive. Hence J(r,), and therefore also the left 
hand side of (8), is a monotonically increasing function of r, . 

Let Pily;as, ree Yp , Uy m,-+*, mp, 7) dyr41 +++ dy,dt be the probability 
that the sample point will fall in the intersection of Z and the set 


yi —ddyi Sy S yi ttdy(i=rti,---,p), U-tdt<st<t+hde 


Similarly let Poly, 41 Ma yy » Uy m,-++, mp, 7) dyr41 +++ dy, dt be the un- 
conditional probability that the sample point will fall in the intersection of Z* 
and the set 


yi —~tdyi Sys yithdy(i=rti,---,p), t-—kdt<t<t +d. 
Since the function gz(y,11, +++ , Yp, 4) has been defined so that 
P2Yr41 y °°" a Sas t) ia Fle(yrsi "ee t), t], 
we obviously have 
Piyrsi one » Yr, t, 0, coe ,o, Nr+1, °""* 4 Np 7) 
_ PolYr41 , -°* 5 Up, t, 0, idee , 0, Nr+l,y °°* » Mp, a). 
Using a lemma’* by Neyman and Pearson, we easily obtain 
/ Pl Yrit, *** > Ypy tym, *** 5p, a) dA 
ee S(nr +197 ++ Mp A) 
(15) 
> |[ PiYrs., *°* > Ups tym, *** > 2p, a) dA 
S(npty9' + ompOA) 


‘J. Neyman and E. 8S. Pearson, ‘‘Contributions to the theory of testing statistical 
hypotheses,’’ Stat. Res. Mem., Vol. 1, London, 1936. 
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from (14) and the fact that the left hand side of (8) is a monotonically increasing 
function of r) = yi + --: + y;. Condition (b) is an immediate consequence 
of (15). Hence Lemma 1 is proved. 

For the proof of our theorem we will also need the following 

LeMMA 2: Letvy, --+ , v, be k normally and independently distributed variates 
with a common variance o. Denote the mean value of v; by a(i = 1, «++ , k) and 
let f(r, +--+ , vu, 7) be a function of the variables v1 , --+ , v, and o which does not 
involve the mean values a, , +++ , a. Then, of the expected value of f(v1 , +++ , ve , ) 
is equal to zero, f(v1, +--+ , vu: , «) ts identically equal to zero, except perhaps on a set 
of measure zero. 

Proor: Lemma 2 is obviously proved for all values of o if we prove it for 
o = 1. Hence we will assume that o = 1. It is known that a k-variate distri- 
bution which has moments equal to those of the joint distribution of 1, --- 


VE 
’ ’ 
must be identical with the joint distribution of v1, --- ,v,. That is to say, the 
joint distribution of v; , --+ , v, is uniquely determined by its moments. Hence if 
k 
$00 +00 , ; —} Zz (v; a,;)2 
(16) [ 2% [ V1! Vo" +°° viF qr res VEE i=1 dv, eeu dv; = 0 
—°o 00 
for any set (71, --- , 7%) of non-negative integers, then g(v1, --+ , v~) must be 
equal to zero except perhaps on a set of measure zero. Now let f(v1, «++ , v%) 
be a function whose expected value is zero, i.e. 
k 
es — 4D (vj—aj)? 
(17) vee | flr, +++, ue im dv, --: dy, = 0 
— 00 — 00 
identically in a,, +--+ ,a,. From (17) it follows that 
k k 
= ~~ 4 Z vtt+ Y ayn; 
(18) eee f(1 A eee ‘ V,)e i=l t=1 dv, eee dv, = O 
— 00 — 00 
identically in a,, +--+ ,a,. Differentiating the left hand side of (18) 7 times 
with respect to a; , 72 times with respect to az, +--+ , and r; times with respect to 


a, , we obtain 


a om rh 1D (v5—a5)? 
(19) [ eeo0 v;! ~~ vie f(vy 4 Sty DEE dv; ees dv;, = 0. 


3 0 
From (16) and (19) it follows that f(m.,---,v%) = 0. Hence Lemma 2 is 
proved. P 
Using Lemmas | and 2 we can easily prove our theorem. Because of Lemma 1 
we can restrict ourselves to critical regions W which are given by an inequality 
of the following type 
Yit ess + Yr 2 Pritt, *°* 5 Yr, Fb) 


where ¢(Y;41, °** 5 Yp, 4) is some function of y,41, --+ , yp and t. The above 
inequality can be written as 
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vito Tory 


(20) 7 


W(Yr4t 9 °°? s Bas t). 


For any given values of y,41, --- , ¥p, ¢ denote by P(y,41, --- , yp, ¢) the 
conditional probability that (20) holds calculated under the assumption that 
m= -:': =, = 0. It is obvious that P(y,41, --- , yp, t) does not depend on 
the unknown parameters 7,41, °°: ,%p,o¢. If we denote by W the critical 
region defined by the inequality (20), we have 


Bw(0, soo 5 @, Gas 72° ”p, a) 


(21) =| vf I PCY.) °° Vos Eps Yrsr, °°° > Yps Mets °° %» Np») 
— 00 00 


X pelt, o) dyria +++ dy, dt 


where pi(Y%41, °** 5 Yp> N41 °** > Mp, 7) denotes the joint probability density 
function of ¥,41, °** , Yp and p.(t, «) denotes the probability density function 
of ¢ calculated under the assumption that 7; = --- = 7, = 0. In order to 
satisfy the condition of our Theorem, the function y in (20) must be chosen so 
that 


+e +00 pw 
(22) [. ff P( Yea. °° » Up» E)pr(Yraa s °°* > Yp> Mets °°° > Mp» @) 


X pelt, a) dyria +++ GYp dt = (0). 
Let 


(23) I P(Yrs1 g 2? 5 Bs t)po(t, oc) dt _ Qyra ye" 2 ey a). 


Then we obtain from (22) 


(24) [ ro Qyr41 5 °°" 5 Bes o)p1 dy ra ae dyp — 8o(0). 


From (24) and Lemma 2 it follows that 
(25) Q(Yr4i y ** os ee 2) — 8o(0) 


except perhaps on a set of measure zero. From (23), (25) and a result’ by P. L. 
Hsu we obtain 


(26) P(Yr41 s *** oie t) aa (0) 


except perhaps on a set of measure zero. 
It follows easily from (26) that ¥(y,41, --- , Yp, ¢) is equal to a fixed constant 
except perhaps on a set of measure zero. This proves our Theorem. 


5 P. L. Hsu, ‘‘Notes on Hotelling’s generalized T,’’ Annals of Math. Stat., Vol. 9, p. 237. 
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A NOTE ON THE ESTIMATION OF SOME MEAN VALUES FOR A 
BIVARIATE DISTRIBUTION 


By Epwarp Pavtson’ 


Columbia University 


In this paper two problems are discussed which were suggested by the theory 
of representative sampling [1], but which also occur in several other fields. The 
' 4 a — 
first problem is to set up confidence limits for —, the ratio of the mean values 
mM, 
of the variates x and y. This comes up in the following situation. Let a popu- 
lation z consist of N units 21, %2, -+: Xy and suppose we wish to set up confi- 


N 
> Xi 


dence limits for the mean X = 7 . Also assume the population 7 has been 
divided into MV groups, let v; be the number of individuals in the j group and 
u; be the sum of the values of x for the v; individuals in the j*" group, so X = 
Uy + Ue Uy Mm, 
nat+m:-:-o, Mm, 
taken, yielding observations (u1, v1), (U2, v2) +--+ (uz, v,) and N is unknown, 
the determination of confidence limits for X clearly becomes a special case of 
the first problem. The distribution of a ratio, discussed by Geary [2], does 
not seem to be well adapted for this purpose. 

The second problem, which is of greater practical interest, arises when we 
again have a random sample (uw, 71) --- (un, v,) of n out of M groups and N 
and 7 are known. The standard estimate of X that has usually been made 


nm 


Vi 2d ui 


is X = - Vv? where &@ = — . This estimate does not utilize the fact that the 
n observations on v can be used to increase the precision of the estimate of the 
numerator of X. This is a special case of problem 2, which we can now formu- 
late as how to best estimate m, (the mean value of a trait x) both by a point and 
by an interval, when for each unit in the sample observations both on x and 
on a correlated variate y are obtainable, and m, is known a priori. Situations 
of this type occur fairly often. It is possible to reduce the second problem to 


Now if a random sample of n out of the M groups is 


the first by using —-m, as the estimate of m, , and by multiplying the confidence 
Yy . 


ad i ooo 
limits for — by m, to secure limits for m, , but this will not usually be the most 
My 


efficient procedure. 
. . . . ° > 
In both problems two cases will be distinguished: (a) when o; , o, and p are 
known a priori, and (b) when they are unknown. To determine confidence 


1 Work done under a grant-in-aid from the Carnegie Corporation of New York. 
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limits for a it will first be assumed that the probability density f(xy) of x 
y 


and y is 


f(x,y) 


( 2 ¢ 
(1.1) . ae 1 -* _ = = y—m, 
tee ASS 
Qroz0yV/1 — p ; 


. m : : oa : 
Denote the ratio —* by K (assuming m, ~ 0), and suppose it i¢ desired to test 
the hypothesis that K = Ky on the basis of a sample of n independent observa- 


tions (%1, Y1) -** (Xn, Yn). 


n 
» a 
Let 2; = x; — Ky, and Z = ‘=! Since z is a linear function of x and y it 
n 
must be normally distributed, and its mean value is obviously zero. Therefore 


(1.2) _ . = a ——— 
7 Ve — 2Kpo.0, + Ka, 


will be normally distributed about zero with unit variance, and the hypothesis 


° ° °e > 1 7 —4¢2 e 
is rejected if | u(Ko) | > ua, where Vi e* dt = 3a. It is easy to show 
Ua 


2r 

that this test is equivalent to that based on the likelihood-ratio. 

Confidence limits for K would now be given by values of K satisfying the 

| = 
inequality | “a < u., provided they always constituted a closed non-empty 
z 

interval. This is equivalent here to the requirement that A be a real valued 
monotonic function of u in the interval —-x < 4 < ~; this requirement is 
unfortunately never exactly fulfilled, as can be seen from the graph of (1.2) 


? 


(in the u, K plane), for the curve has two horizontal asymptotes u = + 
Oy 

x a eS 

— = p—). However, K will 

Y Gy 

always be a monotonic function of u in the interval —u,g < u < u, provided 


ving 


and one maximum or minimum point (unless 


. 
Ua. Since m, ~ O, by taking n sufficiently large the probability 


ny . , , : 
that “—* | < u, can be made arbitrarily small. Moreover, for values of a 


po . ; . m, will be such that 
ordinarily used, in most practical problems the value of 


Vng!| 


t Oy 


Oy 


even for quite small samples the probability Ua (that is, the proba- 
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bility of getting a sample for which the values of A that are accepted will not 
form a real interval) will be quite negligible. For example, let a have the 
: ‘ m ; 

conventional value .05, and suppose —” = 2; then for n = 9, Prob, 
ny ; : nq | 

fv 4! < 1.96) < 10% and for n = 16, prob. {| ¥"4| < 96} cw 

\ ve \ Oy 

Subject to these rather weak restrictions on the order of magnitude of n and 

My, 


y 


, the confidence limits for K are 
Oy 


tn es = 2 2 ae : ty , 3 
(1.3) (niG — Wy pord,) + V (nig — Ug pozrsy) — (ny — Ugo,)(n® — uyo;z) 
— 2 2 . : 
ny — Ugo, 
9 9 - 7 . 
In case (b) when o;, o,, and p are unknown, each z; = x; — Ky; is still 
normally and independently distributed with zero mean and a common variance, 
It follows that 


; Vni | /n (& — KG) 
(1.4) if? 2—2" Vs —2rs.s,K + 82K’ 
a= i] 
will have Students’ distribution with n — 1 degrees of freedom. Subject to 


practically the same restriction as before, the confidence limits for K as deter- 
mined from (1.4) are 


a 2 eae 2 2 2 2 3 a : 2 
(1.5) (nig — ta7rszs,) + V (nig — ti 7s28y) — (nig — tas5)(nz — 2, 8?) 
5 . . 


aa me 
ny — t. 8, 


where ¢, is the critical value of Students’ distribution (for n — 1 degrees of 
, 2 X(4;—%)” 2 Sy: — 9) : ; 
freedom) and s; = (r: 7 = i = , and ¢ is the sample correlation 
n— n— 
® 


between x and y. 

When the distribution of x and y deviates considerably from a_ bivariate 
normal one, it would still appear that as a practical matter much the same 
methods could be used. The basis for this is the fact that there is considerable 
experimental evidence [3], [4] to show that the distribution of the mean of a 
sample drawn from any population likely to be encountered in practise will 
approach normality very rapidby even for n quite small. Hence Z and wu can be 
regarded as normally distributed for n say >25, and the confidence limits for 
— will then be given by (1.3); in case (b) a somewhat larger sample is required 

y 
to diminish the error in estimating ¢.. But for n say >50, ¢ will have a distri- 
bution close to normal and the confidence limits for K are given by (1.5) (with 
t. replaced by u.). The statements for the non-normal case appear as a prac- 
tical matter to also hold when the sample is drawn from a finite population of NV 
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units without replacement if N — n is not too small, provided n is replaced by 
N - } , 2 1/N — n\, 2 a 
n |. for now o(:—Kz) = — | <——~ } loz — 2p0,0,K + o,K’]. 
i=). a ee i ane + eh 
In the second problem we again start by assuming the distribution of x and y is 
given by (1.1). For case (a), m, is the only unknown parameter. If P = 
: 0 log P 
Il few: |m,) and @ = ———, then 
i=l 


Om: 


1 23(x; — mz 2 
‘ { mz) _ 2p 


rs 2(1 — p*) 


o; Or0y 


zy; ae. mob, 


and the maximum likelihood estimate ™ of m; is 


“ - Cay sr 
(1.6) m =i —-—(9—-—m,), 
Oy 
where o,, = pozo,. Also m is a sufficient statistic, and the confidence interval 
given by the set of values of m, satisfying 


= _ oO ae 
Jn | — "(9 — m,) — m.| 
Fini es i 
oz:V/1 - p 
will be a ‘‘shortest unbiased confidence interval” in the sense of Neyman. 
Case (b) will be more important, since the exact values of the variances and 


covariance will usually be unknown. By analogy with (1.6), a similar estimate 
of m, for this case is 


* 


Mo= Fi — ~ (7 — m,). 
uy 
This is precisely the least square estimate of x; corresponding to y; = m,, and 
has been used for this problem before; for example, it is discussed by Cochran [5]. 
We shall discuss some additional aspects of the problem, and also mention the 
application to the special case of representative sampling by groups. 

When the bivariate distribution of x and y is such that the conditional distri- 
bution of each x; is normal with mean A + By; and a common variance, then 
Professor Wald has suggested that exact confidence limits for m, for small 
samples ean be secured by using the standard methods of the theory of least 
squares. The resulting confidence limits are easily seen to be 


t ” 
Mo + on al a 
a/n —2 


where 
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. 
and ¢, is the critical value of Students’ distribution with n — 2 degrees of 
freedom at a level of significance = a. 

The requirement that the regression of x on y be linear is rather stringent, 
although it may often be fulfilled, especially in the case of representative sampling 
mentioned in the opening paragraph. When the regression of x on y is non- 
linear, the estimate given by (1.7) requires some further justification. Let 
Ui; = E(zx'y’), where E denotes the mean value, and assume that we have n 
independent pairs of observations and that the moments Uy, Un, Un, Uo, 
Uce , Us , Uog and U2: are all finite. It then follows from a theorem of Doob [6] 
that ~/n(%%. — m,) tends to a limiting distribution with increasing n which is 


normal with zero mean and variance equal to o;(1 — p). 
The estimate Z is clearly always less efficient than m2 unless p = 0. The 


; & P : 
estimate —-m, is known to have a large sample variance 


y 
». S12 Mz m:\" >» 
V = . | 6: a 2 ( = ) Ory + ( ) “| . 
n My My 


_£ s -_ . o . . 
So —-m, is always less efficient than m2 unless mz = —” m,, at which point V 
y oy 
9 9 


— —_ o:(1 — . ‘ , 
attains its minimum value ~**—— p) . In fact m2 can be easily shown to have 
n 

pins ee , rs 3 Zz 
an efficiency > any other statistic of the class Q, (whieh includes Z and ~ my) 

y 
consisting of all statistics g satisfying two conditions: (1) that ~Wn(q — m,) 
have a distribution approaching normality with zero mean and finite variance 

2 g . . * e . . . . . . 

o, and (2) o, be independent of the joint density function of x and y, involving 
only certain of the moments u;;. A rather artificial member of the class Q is 
& log 7 Sz 


3 es 3 rr ° ° . . 
q= (Wg — Wm,). The proof consists merely in observing that 


log m, < 
if for any bivariate distribution o%, = o:(1 — p) > a,, this would also have to 
be true when the distribution of « and y is a bivariate normal one, which is 
impossible, since o:(1 — p') is then the variance of +/n(ht, — m,), m§, being 
the maximum likelihood statistic. 

For moderate values of n, say n > 100, fairly exact confidence limits for m, 
will be given by mz + z s(1 — r’). When the sample is drawn from a 


finite population of N units without replacement, the confidence limits for 
A u N = 2 ‘ 9 
n > 100 are m2, + = /N i V/s2(1 — r°). 
n = 


In the problem of estimating m, = X for the population II, discussed in the 
opening paragraph, which consists of N individuals divided into M groups, on 
the basis of a random sample (uw, 01), (U2, v2) +++ (Un, Yn) of n out of the N 
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M E _ (6 _ x) 
groups, an efficient estimate will be m’ = gto ‘+. Theefficiency 


; . , Mi. ae . — 
of m’ relative to the conventional estimate ws (1 — pur»), which ordinarily 


would seem to be quite large. This is easily extended to the case II is divided 
into / strata with M; groups comprising N; individuals in the 7" stratum, when 
a random sample of m; out of the 7; groups in each stratum is taken. Let v;; 
be the number of individuals in the j™ group of the 7" stratum and u,; denote 
the sum of the values of x for these v;; individuals. The estimate of m, becomes 


l T 

oe N; 

DM | a — (a - 
= ‘=< aad ja — Ss . c 7) | 


N 


m 


l 
If >. m; = mis fixed, the large sample variance of m” will be a minimum if m; 
i=1 


is proportional to M,e,,+/1 — p?, where p; is the correlation between u and » 
in the 7” stratum. 

In conclusion, the writer wishes to thank Professor A. Wald for his advice 
and encouragement, and Mr. Henry Goldberg for several suggestions. 
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SIGNIFICANCE LEVELS FOR THE RATIO OF THE MEAN SQUARE 
SUCCESSIVE DIFFERENCE TO THE VARIANCE 


By B. I. Harr 
Ballistic Research Laboratory, Aberdeen Proving Ground 


For purposes of practical application in connection with significance tests a 
tabulation of the argument corresponding to certain percentage points of the 
probability integral is usually more convenient than that of the probability 
integral for equal intervals of the argument. A table of probabilities for the 
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2 


2 
. . ; 2 : 2 6 
ratio of the mean square successive difference 5 to the variance s’, (2 < t) = 


k 
I w(d°'/s°) d(&'/s°), where w(é'/s’) is the distribution of 4’/s’,’ has been published 
0 


recently’ with k as argument. The following table of values of &°/s’ for P = 
.001, .01 and .05 has been computed from it by interpolation. 

Since the distribution of 6/s’, w(é/s), is symmetric’ about E(é’/s°), 
P(s/s <k) = P(&/s > k’) if E(8/s) —k = k’ — E(8/s°), where E(8/s’) = 
2n/(n — 1).2 The upper levels are rarely of practical use, since large values 
of the ratio, 6 /s’, could arise only from a somewhat artificial set of observations, 
such as alternately high and low values of the observed variable. 

The computation of this table of significance levels was made at the sugges- 
tion of Lt. Col. L. E. Simon. 


1 For determination of w(5?2/s?) ef. JoHN VON NEUMANN, “‘Distribution of the ratio of the 
mean square successive difference to the variance,’’ Annals of Math. Stat., Vol. 12 (1941), 
pp. 367-395. 

2 B. I. Hart, ‘‘Tabulation of the probabilities for the ratio of the mean square successive 
difference to the variance,’’ Annals of Math. Stat., Vol. 13 (1942) p. 213. 

3 Loc. cit.! p. 372 for proof of symmetry and evaluation of E(6?/s*). 


A CORRECTION 


By M. A. GirRsHICK 
U.S. Department of Agriculture, Washington 


In my article ““Notes on the Distribution of Roots of a Polynomial with 
Random Complex Coefficients’ which appeared in the June 1942 issue of the 


Annals of Mathematical Statistics, the symbol Zz >» in formulas (13), (14), 
p=1 q=prt+l 


and (15) should be replaced by [] ae 


p=1 g=prl 





REPORT OF THE POUGHKEEPSIE MEETING OF THE INSTITUTE 


The Fifth Summer Meeting of the Institute of Mathematical Statistics was 
held at Vassar College, Tuesday and Wednesday, September 8-9, 1942, in 
conjunction with the meetings of the American Mathematical Society and the 
Mathematical Association of America. The following fifty-eight members of 
the Institute attended the meeting: 


K. J. Arnold, L. A. Aroian, K. J. Arrow, Walter Bartky, Felix Bernstein, C. I. Bliss, 
A. H. Bowker, J. H. Bushey, Belle Calderon, B. H. Camp, A. C. Cohen, Jr., A. H. Copeland, 
C. C. Craig, J. H. Curtiss, W. E. Deming, J. L. Doob, M. L. Elveback, Willy Feller, M. M. 
Flood, R. M. Foster, H. A. Freeman, T. N. E. Greville, C. C. Grove, E. J. Gumbel, Edward 
Helly, G. M. Hopper, Harold Hotelling, Dunham Jackson, R. E. Jolliffe, Irving Kaplansky, 
Karl Karsten, B. F. Kimball, Howard Levene, Eugene Lukacs, H. B. Mann, E. B. Mode, 
E. C. Molina, F. C. Mosteller, C. R. Mummery, M. L. Norden, E. G. Olds, Oystein Ore, 
Edward Paulson, Selby Robinson, F. E. Satterthwaite, Henry Scheffé, L. E. Simon, Morti- 
mer Spiegelman, Arthur Stein, J. R. Tomlinson, A. W. Tucker, J. W. Tukey, D. F. Votaw, 
Jr., Abraham Wald, 8S. 8S. Wilks, E. W. Wilson, Jacob Wolfowitz, L. C. Young. 


The opening session, on Tuesday afternoon, was devoted to contributed 
papers on Probability and Statistics and was held jointly with the American 
Mathematical Society. The Chairman was Professor Cecil C. Craig, Uni- 
versity of Michigan, and the following papers were presented: 


1. On the Theory of Testing Composite Hypotheses With One Constraint. 
Henry Scheffé, Princeton University. 
2. On the Consistency of a Class of Non-parametric Statistics. 
Jacob Wolfowitz, Staten Island, N. Y. 
3. Graphical Controls Based on Serial Numbers. 
E. J. Gumbel, New School for Social Research. 
. Significance Tests for Multivariate Distributions. 
D.S. Villars, United States Rubber Company. (Introduced by E. G. Olds.) 
5. On the Choice of the Number of Class Intervals in the Application of the Chi-square 
Test. 
H. B. Mann and Abraham Wald, Columbia University. 
. Generalized Poisson Distribution. 
F. E. Satterthwaite, Aetna Life Insurance Company. 
. The Relationship of Fisher’s Z Distribution to Student’s vt Distribution. 
Leo A. Aroian, Hunter College. 
. On a Statistical Problem Arising in the Classification of an Individual In One of Two 
Groups. 
Abraham Wald, Columbia University. 
. Modern Statistical Methods in Penology. 
Saly R. R. Struik, Radcliffe College. 
Miriam van Waters, Framingham, Mass. 
10. Regularity of Label-sequences Under Configuration Transformations. 
T. N. E. Greville, Bureau of the Census. 
By Title: 
On the Ratio of the Variances of Two Normal Populations. 
Henry Scheffé, Princeton University. 
Abstracts of these papers follow this report. 
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On Wednesday morning Professor Harold Hotelling, Columbia University, 
acted as Chairman of a session on Stochastic Processes. The following papers 
were presented: 


1. Persistence and Recurrence. 
A. H. Copeland, University of Michigan. 
. Definitions and Some Practical Applications. 
Willy Feller, Brown University. 
3. General Theory and Applications to Physics. 
J. L. Doob, University of Illinois. 


The session on Wednesday afternoon was held jointly with the American 
Mathematical Society. Lt. Col. Leslie E. Simon, U.S. A., served as Chairman, 
and the following papers on The Applicability of Mathematical Statistics to War 
Efforts were presented: 


. Statistical Prediction, With Special Reference to the Problem of Tolerance Limits. 
S. S. Wilks, Princeton University. 
Discussant: J. H. Curtiss, Cornell University. 
. On the Nature of Mathematical Statistics in Quality Control. 
W. Edwards Deming, Bureau of the Census. 
Discussant: Walter Bartky, University of Chicago. 


A meeting of the Board of Directors was held on Tuesday evening. Following 
the joint dinner on Wednesday evening, a concert was given in Skinner Hall by 
members of the music department of Vassar College. 


EpwIn G. OLps, 


Secretary 




















ABSTRACTS OF PAPERS 


(Presented on September 8, 1942, at the Poughkeepsie meeting of the Institute) 


On the Theory of Testing Composite Hypotheses with One Constraint. Hrnry 
ScHEFFE, Princeton University. 


A composite hypothesis with one constraint specifies the value of one and only one 
parameter of a set occurring in a distribution function. The theory of testing such hypoth- 
esis is not only of direct interest for many important problems, but is intimately related 
to Neyman’s theory of confidence intervals (Phil. Trans. Roy. Soc. London, 1937). <A 
method of Neyman (Bull. Soc. Math. France, 1935) for finding type B regions for testing 
these hypotheses is extended to the case of any number of nuisance parameters. Type B, 
regions are defined by generalizing the type A; regions of Neyman and Pearson (Stat. Res. 
Mem., 1936) to the case where nuisance parameters are present, and sufficient conditions 
are found that a type B region be also of type B,;. An interesting moment problem is 
encountered, in which the admissible functions are not of constant sign, and is solved for 
the case where the original distribution is multivariate normal. 


On the Consistency of a Class of Non-Parametric Statistics. 


J. WoLFrow!tz, 
N. Y. City. 


Let X and Y be two stochastic variables about whose distribution nothing is known 
except that they are continuous and let it be required to test whether their distribution 
functions are the same. Let V be the observed sequence of zeros and ones constructed as 
described elsewhere (Wald and Wolfowitz, Annals of Math. Stat., Vol. 11 (1940), p. 148). 
Suppose that the statistic S(V) used to test the hypothesis is of the form S(V) = Z¢(l;), 
where 1; is the length of the j-th run and ¢(z) a suitable function defined for all positive 
integral x. The notion of consistency, originated by Fisher for parametric problems, has 
already been extended to the non-parametric case (loc. cit., p. 153). The author now 
proves that, subject to reasonable conditions on ¢(z) and statistically unimportant restric- 
tions on the alternatives to the null hypothesis, statistics of the type S(V) are consistent. 
In particular, a statistic discussed by the author (Annals of Math. Stat. September, 1942) 


. wz 
and for which g(x) = log ( :) belongs to the class covered by the theorem. 
ey 


Graphical Controls Based on Serial Numbers. E. J. Gumpert, New School 
for Social Research. 





The index m of the observed value z», (m = 1, 2, --+ n) is ealled its serial number. A 
value x of a continuous statistical variable defined by a probability W(x) = d is called a 
grade (e.g. the median for \ = 3). The coordination of serial numbers with grades furnishes 
two graphical methods for comparing the observations and the theory, namely the equi- 
probability test based on m = nd, and the return periods based on m = nd + 3}. 

Starting from the distribution of the mth value, we determine the most probable serial 
number m = nd + A, where A depends upon the distribution. For a symmetrical dis- 
tribution, the corrections A for two grades defined by \ and 1 — X, are equal in absolute 
value and opposite in sign. Then no correction is needed for the median. For an asym- 
metrical distribution, we caleulate the most probable serial number of the mode con- 
sidered as an mth value. Thus the mode is obtained from the observations through the 
theory. In this case the mode is not the most precise mth value. 

If m is of the order 3n, the distribution of the mth value converges towards a normal 
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distribution with an expectation given by m = nW(z), and a standard deviation s(x), where 
s(t) Vn = VW(2)(1— W(z) /W(a). By attributing to eachtheoretical value z its stand- 
ard deviation, we obtain intervals z+ s(x) which may be used for the control of the equi- 
probability test, the comparison of the observed step function with the frequency, and the 
comparison of the observed with the theoretical return periods. Besides, the standard 


error of the mth value leads to the precision of the determination of a constant obtained 
from a grade. 


Significance Tests for Multivariate Distributions. D.S. Vinuars, U.S. Rubber 
Company. 


The observed mean of sets of m variates, each normally and independently distributed, 
is distributed around the population mean according to a x? distribution with m degrees 
of freedom. The sum of squares of deviations of n observed points from the observed mean 
is distributed as x? with m(n — 1) degrees of freedom (not with nm — 1). A much more 
powerful test for correlation than that by the correlation coefficient is described, which for 
bivariate distributions, involves comparisons between n — 1 and n — 1 degrees of freedom. 
This can be extended to m — 1 tests with m variates. Distribution of distance between two 
means and distribution of fiducial radius is worked out in detail for two variates. 


On the Choice of the Number of Class Intervals in the Application of the Chi- 
Square Test. H. B. Mann and A. Watp, Columbia University. 


The distance of two distribution functions is defined as the l.u.b. of the absolute value 
of the difference between the two cumulative distribution functions. Let C(A) be the class 
of alternatives with distance >A from the null-hypothesis. Let f(N, k, A) be the g.l.b. 
of the power with respect to alternatives in C(A) of the chi-square test with sample size NV 
and k equally probable class intervals. A positive integer k is called best with respect to 
sample, size NV if there exists a A such that f(V, k, A) = 3 and f(N, k’, A) < 3 for every 


5 : 2 
. ii ‘ a= 20 ] 
positive integer k’. The authors show that ky = 4/2 JD) where —=— e?* dz 


ia 2r c 


is equal to the size of the critical region, fulfills approximately the conditions of a best k 


‘ 0 ‘ ai ‘ . - 
with Ay = ke = - as the correspondingvalue of A. The approximation is shown to be 
cN Nn 
satisfactory for NV > 450 if the 5% level of significance is used and for N > 300 if the 1% 
level is used. 


Generalized Poisson Distribution. F. E. SarrertHwaire, Aetna Life Insur- 
ance Company. 


In this paper the Poisson distribution is generalized to allow for the assignment of 
varying weights to a set of events when the number of events follows the Poisson law. 
The development used brings out the fact that distributions falling in this class do not 
require that the underlying statistics be homogeneous. The only requirement is that they 
be independent. Formulas are given for the moments of the generalized distribution as 
functions of the moments of the underlying distribution of weights. The principles to 
be observed in the solution of practical problems are outlined. 


The Relationship of Fisher’s z Distribution to Student’s ¢ Distribution. Lro 
A. Aroran, Hunter College. 


For n; and nz sufficiently large W = — _N_, is distributed as Student’s ¢ with N de- 
N +1 
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‘ 1/1 1 ies . 
grees of freedom, N = mn, + n2 — 1, 6? = (2 + *), If the level of significance is a for 
2\n ne 
1 Z pt 5 
Student’s distribution, the level of significance for z will be -" ~ sae 3 Neg, Asa 


corollary it follows that the distribution of z approaches normality, m1, n2— ©, with mean 
if i 1 5 fede 5 
zero and variance 3 + . This simplifies a previous proof of the author. Application 
Ne ni 
of this result is made to finding levels of significance of the z distribution. On the whole 
R. A. Fisher’s formulas for this purpose, m; and nz large, as modified by W. G. Cochran are 
superior. The results given by the Fischer-Cochran formulas are compared with those 
obtained by using the formula recently found by E. Paulson. 






On a Statistical Problem Arising in the Classification of an Individual in One of 
Two Groups. ABRAHAM WALD, Columbia University. 


Let 7, and zz be two p-variate normal populations which have a common covariance 


matrix. A sample of size N; is drawn from the population 2;(¢ = 1, 2). Denote by zig 
the a-th observation on the 7th variate in m , and by yig the 6th observation on the 7th 
variateinz:. Letz:(i = 1, --- , p) be a single observation on the 7th variate drawn from a 


population 7 where it is known that z is equal either to 7 or to m2. The parameters of 
the populations 7; and ze are assumed to be unknown. It is shown that for testing the 
£ 
hypothesis = 7 a proper critical region is given by U > d where U = YEs%z;(9; — #; 
YI Yj 


|| s*7|| = || si; ||7, si3 = (i J (Lia — Fi) (Lja—Fj) + >a J (yis — 9: (yis — F|/¢ Ni+tNo- 2). 
_ CG. Jztia)/Ni, Hi = (> Jyis)/Nzandd isa aia int. The large sample distribution 


of U is derived and it is ian n that U is a simple function of three angles in the sample 
space whose exact joint sampling distribution is derived. 


Modern Statistical Methods in Penology. Saty R. R. Srrurk, Radcliffe 
College and Miriam van Waters, Massachusetts Reformatory for Women. 


In applying statistical methods to penological problems, so far the best known studies 
have considered 100, 500, or once in England (to refute Lombroso’s theory) 1500 cases. 
But from the correct statistical standpoint, far more cases are needed to establish a law. 
Over a period of years, an attempt has been made to use statistical methods in the study 
of penological problems in the Massachusetts Reformatory for Women, but the results 
will take on real significance and be conclusive only when similar investigations are made 
all over the United States. 


Regularity of Label-Sequences Under Configuration Transformations. T.N. E. 
GREVILLE, Bureau of the Census. 


There is developed a class of transformations on sequences of arbitrary labels in terms 
of which a wide variety of problems in the theory of probability can be formulated. It is 
shown that, with mild restrictions on the transformations used and on the measure function 
assumed on the label-space, almost every label-sequence produces a transform having the 
frequency distribution expected. The class of transformations considered is shown to 
include as special cases the four fundamental operations of von Mises: place selection, 
partition, mixing, and combination. 
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On the Ratio of the Variances of Two Normal Populations. Henry Scuerré, 
Princeton University. 


Let @ be the above ratio. The two problems considered in this paper are the formulation 
and comparison of (i) significance tests for the hypothesis 6 = @, and (ii) confidence 
intervals for 86. The paper is divided into two parts; the first is kept on an elementary level 
and only solutions based on the F-distribution are considered. Following various ap- 
proaches, six tests and corresponding sets of confidence intervals are introduced. It turns 
out that the limits on the F-distribution which yield an unbiased test are the same as those 
which yield confidence intervals optimum in a certain intuitive sense. The values of these 
limits are difficult to compute and some numerical data are given to indicate the loss of 
efficiency in using instead the easily obtained “‘equal tails’’ limits. The second part of 
the paper is concerned with the existence of common best critical regions and type B, 
regions, and the application of Neyman’s theory of confidence intervals. No new tests or 
confidence intervals not already considered in part I are obtained, but those previously 
judged best of a very narrow class are now shown to be best of all those based on similar 
regions of the same size. 
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